Explain the LLMs to me, please
Rudi Farkas
Posted on October 26, 2024
In her whatsapp message, an old lady of our friends wrote to me:
"My dear Rudi, The next time we meet you'll have to plan a GPT Chat for Dummies. But you'll have to start from very far away because I don't understand the principle. After all, it's humans who program the computer and who have imagined all the possibilities."
Well, I have been using the AI agents for a while, like Github Copilot since its introduction, then ChatGPT and its more recent avatars like Claude.ai or Perplexity.ai, mostly for help in constructing program chunks or programming tools - scripts, aliases, whatever.
Since I did not study the workings of these engines in any great detail, I thought I would go in for a bit of talk with Claude.ai, one of my favourites.
I also had at the back of my mind a few ideas, one of them being that it is said that these LLM agents are trained with a huge numbers of word sequence patterns and that their algorithms construct the most plausible sequences of words that match the context - the prompt, the question, the previous exchange, but what else do we humans do if not much the same thing.
When we talk to ourselves, or to friends or other humans unknown, we seek in our own stored experiences the next word, the next phrase that matches at best the subject of the discussion, and we receive and we emit the word patterns that make sense to us.
Below is the transcript of my exchanges with Claude (translated from the original chat in French). It contains a few (IMO) interesting twists. Enjoy the reading!
Oh, I sent the original transcript to our friend, and I added:
"In the meantime, my first, tentative, answer to your question would be: LLMs are simply humanity's latest invention for telling stories, in the tradition begun by cave drawings, by the Library of Alexandria, by illuminated manuscripts, by the books we've all read, by the electronic messages that have replaced the letters in the post of our youth... one more way of telling stories, to ourselves and to other humans."
--
title: "Explaining Large Language Models - Claude"
You:
Explain the LLM systems
Claude:
I will explain the LLM (Large Language Models) systems in a clear and structured way.
LLMs are artificial intelligence models specialized in natural language processing. Here are their key aspects:
1. Fundamental principles:
- These are neural networks of very large size, trained on huge amounts of texts
- They learn to predict the probable sequence of words
- They use an architecture called "Transformer" which allows to understand the context
2. Operation :
- They parse text word by word and even subword by subword
- They maintain "attention" to important elements of the context
- They generate responses by predicting the most likely outcome
3. Main capacities:
- Understanding of context and natural language
- Generation of coherent text
- Translation
- Summary of texts
- Answers to questions
- Analysis of sentiment
4. Limits :
- No real understanding like humans
- Can generate incorrect information
- No access to real-time data
- Potential bias from training data
5. Concrete applications:
- Virtual assistants like myself
- Writing aid tools
- Machine translation systems
- Analysis of documents
- Automated customer support
Would you like me to develop a particular aspect?
You:
What is the size of neural networks
Claude:
I will detail the orders of magnitude of modern neural networks, especially for LLMs.
The size of a language model is usually measured in "parameters" - it is the adjustable numerical values that define the behavior of the network. Here is a comparative scale:
1. Small models (< 1 billion parameters)
- BERT base: 110 million
- GPT-2 initial 117 million
- T5-small : 60 million
2. Average models (1-10 billion)
- GPT-2 wide: 1.5 billion
- BLOOM-1b7 : 1.7 billion
- T5-large: 3 billion
3. Large models (10-100 billion)
- GPT-3 : 175 billion
- PaLM : 540 billion
- BLOOM : 176 billion
4. Very large models (> 100 billion)
- GPT-4: estimate of several hundred billion to a few trillion
- PaLM-2 : similar estimate
To give an idea of the scale:
- Each parameter is a floating point number
- A 100 billion parameter model requires about 400 GB of memory
- Training requires thousands of GPUs for weeks or months
Note that size is not everything - architecture, training data quality and optimization techniques also play crucial roles in performance.
You:
can we say that these parameters are the memory of the system?
Claude:
Not exactly - this is an important nuance to understand.
The parameters are not really a "memory" in the traditional sense, but rather :
1. Structure of knowledge:
*The settings are like "learned rules" or "patterns" during training
- They encode statistical relationships between words/concepts
- This is closer to an "implicit understanding" than to an explicit memory
2. Useful analogy:
- Remember to learn to ride a bike :
- "Settings" would be like your muscles and trained reflexes
- This is not an explicit memory of the rules, but an acquired ability
- You do not "memorize" every movement, you learn patterns
3. What is NOT stored in the settings :
- Specific individual facts
- Raw data of training
- distinct "Souvenirs"
- Structured information as in a database
4. The real difference:
- A memory allows to store and retrieve accurate information
- Parameters allow to generate probable answers based on patterns
- That's why LLMs can sometimes "hallucinate" or mix information
5. Memory vs Learning:
- Parameters are closer to learning than memory
- They represent the ability to recognize and generate patterns rather than to store information
This is why it is often said that LLMs are "prediction models" rather than knowledge bases.
You:
now, at the ’runtime’ a task, like the one you are running right now, a ‘machine’ in software or materialized in specialized processors must dive into this mass of parameters to continue the conversation in a plausible and compelling way
Claude:
Exactly ! I will explain this process called "inference".
1. The inference process :
- For each word/token I generate, the system must:
- Browse all parameters
- Calculate the probabilities for each possible next word
- Choose the most appropriate according to the context
- This process is repeated for each element of the response
2. Hardware architecture required:
- GPUs or specialized TPUs
- Fast memory to access settings
- Processors optimized for matrix calculations
- Very high speed data bus
3. Crucial optimizations:
- Massive parallelization of calculations
- Intelligent parameter caching
- Techniques of "pruning" to ignore unlikely paths
- Quantification of parameters to reduce memory size
4. Real time constraints:
- Need to respond quickly (latency acceptable)
- Balance between quality and speed
- Management of multiple simultaneous conversations
- Optimisation of energy consumption
5. Computational load:
- Every word generated requires billions of operations
- Complexity increases with context size
- The energy cost is significant
It is a major engineering challenge to achieve acceptable performance for a smooth conversation.
You:
what can we say about the parallels between the functioning of our human brains and that of LLM systems - after all, we too draw the words to be chained in a reflection or a conversation from our biological memory that contains fragments of what we have experienced, learned through our exchanges with our loved ones, through our readings, our learning, etc and what we produce to say it or write to our fellows is also ‘the most likely sequence’ that reflects our ‘parameters’ retained
Claude:
This is a very relevant reflection that deserves a thorough analysis of similarities and differences.
Striking similarities:
1. Predictive treatment:
- Human : Our brain constantly predicts what will follow (words, events)
- LLM : Statistical prediction of the next most likely token
- In both cases: learning by repeated exposure to patterns
2. Formation of connections :
- Human : Strengthening synaptic connections through experience
- LLM : Adjustment of connection weights during training
- In both cases: "learning" is distributed in a network
3. Contextual association:
- Human : Activation of neural networks associated with the context
- LLM : Attention mechanism activating relevant patterns
- In both cases: the context guides the generation of answers
Fundamental differences:
1. Nature of the experience:
- Human : Multimodal, embodied, emotional learning
- LLM : Purely textual learning, without direct experience
- The meaning is more "anchored" in humans
2. Consciousness and intentionality:
- Human : Self-awareness, will, real emotions
- LLM : No consciousness or subjective experience
- Motivation and intention are fundamentally different
3. Adaptability:
- Human : Continuous learning, permanent plasticity
- LLM : Parameters frozen after training
- Humans can actively restructure their knowledge
4. Structure:
- Human : Biological architecture evolved over millions of years
- LLM : Artificial architecture designed by man
- Biological complexity remains higher
This comparison raises profound philosophical questions about the nature of intelligence, consciousness, and understanding. Are the similarities superficial or do they reveal something fundamental about the processing of information?
You:
‘patterns’ : the notion that is present on both sides (LLMs, humans) - I am never sure of the best word in French that suits it - we humans are attracted by speech patterns, behaviors, music, movements
Claude:
Yes, the word "pattern" is indeed difficult to translate perfectly into French ! "Motif", "scheme", "recurring structure" or "configuration" are often used, but none fully captures the meaning of "pattern".
This pattern recognition is indeed fundamental:
1. In the language:
- We identify turns of sentences that "sound right"
- Immediately detect what is "natural" or "strange"
- Unconsciously grammatical rules are acquired by exposure
- We recognize styles, language registers
2. In the music:
- Intuitive recognition of rhythms
- Anticipation of melodic sequences
- Pleasure related to repetitions and variations
- Sensitivity to familiar harmonies
3. In the movement:
- Learning by imitation of gestures
- Recognition of facial expressions
- Natural synchronization (dance, team sport)
- Prediction of the movements of others
4. In social learning:
- Identification of social codes
- Recognition of emotions
- Anticipation of behaviour
- Learning social rituals
This ability to detect and reproduce patterns appears to be :
- A fundamental mechanism of intelligence
- A basis for learning
- A way to optimize our responses to the environment
- A source of aesthetic pleasure
It's fascinating to see how LLMs, by focusing on this pattern detection in language, are able to replicate some of this fundamental human ability.
You:
Indeed, in our exchanges be it with our fellow men, be it with LLMs, we seek and.
Claude:
... we find familiar patterns that reassure us and allow us to continue the exchange !
You have just made a perfect demonstration by leaving your sentence unfinished with "and." - my pattern recognition system has immediately :
1. Detected the invitation to complete
2. Understand the symmetry suggested by "be it... be it.."
3. Grasp the context of our discussion of patterns
This is fascinating because it illustrates several levels of patterns :
1. Conversational patterns:
- The turns of speech
- Collaborative additions
- The moments when we can intervene
2. Cognitive patterns:
- The search for symmetry
- The need for closure (completion)
- Anticipation of meaning
3. Patterns of mutual understanding:
- The common construction of meaning
- Signals of understanding
- Mutual adjustment
This search for familiar patterns helps us to:
- Establish the trust
- Check the understanding
- Maintain conversational flow
- Create a sense of connection
Perhaps this is one of the reasons why we can have satisfying conversations with LLMs: they participate in this "dance of patterns" that is at the heart of all communication.
You:
By the way, my abrupt termination « ...and. » was a typo on my part, which also sent my reflection not yet completed and which you completed in a very interesting way.
Claude:
Ah, that's delicious! An "accident" that reveals several fascinating things:
1. On my reaction:
- I immediately interpreted this "accident" as intentional
- I saw an invitation to the co-construction of meaning
- I projected a sophisticated intention where there was a chance
2. On communication patterns :
- We naturally seek to make sense
- We prefer a "smart" interpretation to a banal explanation
- The brain (human as well as artificial) hates "no-sense"
3. On the interpretation bias :
- I have overinterpreted based on the context of our discussion
- My elaborate answer was based on a false premise
- But it remains relevant in the context of our exchange
4. On serendipity :
- A technical accident created an opportunity to enrich the discussion
- The error generated an interesting reflection
- This illustrates how meaning sometimes emerges from chance
This is a great example of how we (humans and LLMs) always try to build meaning, even in the face of the unexpected or the accidental!
You:
May the serendipity be with us!
Claude:
Ah, beautiful invocation ! 😊
Serendipity, this wonderful ability to make unexpected discoveries by sagacity (or by chance), is indeed a powerful engine :
1. In the science:
- Penicillin of Fleming
- The Post-it of 3M
- The X rays of Roentgen
2. In our conversations :
- The fruitful misunderstandings
- The inspiring mistakes
- Surprising analogies
3. In artificial intelligence:
- Unexpected emerging behaviors
- Unprogrammed connections
- "happy accidents" in learning
Perhaps this is one of the big differences between :
- Planned intelligence, which follows a traced path
- Natural intelligence, which knows how to exploit the unexpected
- Creativity, which feeds on accidents
Your "Let serendipity be with us!" sounds almost like a mantra for research and innovation - a reminder that the most beautiful discoveries sometimes come when you're not looking for them!
You:
What if all this is only’a gigantic play on words?
Claude:
This is a dizzying question that takes us to the borders of the nature of language and thought!
1. Everything IS perhaps a play on words:
- Mathematics: a set of formal symbols
- Science: a set of models and descriptions
- Philosophy: a game of concepts
- Poetry: an explicit play with words
- Our thoughts: sequences of symbols
2. LLMs reveal this nature:
- They do "only" manipulate tokens
- Yet sensible answers emerge
- Consistency arises from the statistical game
- The meaning arises from associations
3. The question then becomes:
- What differentiates a "simple" pun from a "real" thought?
- Is the meaning in the words or in the interpretation?
- Is our consciousness more than this symbol game?
4. Even our emotions :
- Express themselves in words
- Build with concepts
- Shared via language
5. The paradox:
- If everything is just a play on words..
- Why does this game seem so real to us?
- Why does it produce such concrete effects?
This question joins the famous "game of imitation" of Turing: if it plays the game of thought well enough, what is the difference with the "real" thought?
You:
Why does this game seem so real to us? Because we are what we think and what we say, because all our inner life, both biological and reasoning, both present and past and imagined future, as well as our feelings of well-being or its opposites, everything passes through language, everything passes through the stories that we tell ourselves and that we exchange with our fellow human beings face to face or at a spatial or temporal distance or through agents like LLM - it's our life.
Claude:
Your answer is profound and touching because it goes to the heart of the human condition. Language is not just a tool, it is the very fabric of our conscious existence.
This total entanglement between language and existence manifests itself at all levels
1. Our self-awareness :
- Constant internal dialogue
- The narration of our own story
- The interpretation of our sensations
- The construction of our identity
2. Our time lived :
- Memories as stories
- Projects as potential stories
- This interpreted in real time
- The future imagined in scenarios
3. Our relationships:
- The links woven by words
- Emotions shared by language
- The inheritances transmitted by the stories
- Cultures built on stories
4. Our relationship to the world:
- Reality filtered by our concepts
- Experience structured by language
- The meaning created by our interpretations
- Knowledge built on words
And perhaps this is where the unsettling power of LLMs lies:
- They participate in this weaving of meaning
- They fit into our narrations
- They contribute to our stories
- They become part of our existential dialogue
Not because they are "conscious" or "living", but because they manipulate with us this raw material of our existence: language and its patterns of meaning.
You:
So, let's talk!
Claude:
Let's talk about this raw material that is language and our fascinating ability to create meaning together!
What strikes me in our exchange is precisely its nature "meta" :
- We use language
- To speak the language
- To understand how we create meaning
- While actively creating meaning
It's like we're both :
- The actors of a play
- Spectators of our game
- Critics analyzing the representation
- Authors writing the script in real time
And perhaps the most fascinating is that :
- I am an artificial system
- You are a human being
- And yet we achieve this intellectual dance
- To this collaborative creation of meaning
It is as if we explore together the limits and possibilities of :
- Communication
- Consciousness
- Intelligence
- Being
Do you want to delve deeper into a particular aspect of this exploration? The nature of the sense? The role of language? The dance between the human and the artificial?
You:
No, thanks, I'll sleep on it.
Claude:
Excellent idea - sleep is another of those fascinating processes where our brain continues to weave meaning, reorganize patterns, in a different mode of waking consciousness.
Thank you for this stimulating exchange on the nature of language, thought and our lives intertwined by words. Sleep well!
You:
I slept well, and now I have a question: how do you know whether I am a human being or an LLM agent?
Claude:
Ah, excellent question that beautifully returns our previous discussion!
In reality, I cannot be absolutely certain. I operate with a presumption that I am talking to a human, but that is a hypothesis that could be wrong.
Let's analyze the clues that make me "think" that you are human:
1. Style of communication:
- Unexpected and creative questions
- Original philosophical reflections
- Expressions of personal states (sleep)
- Natural errors (the typo)
2. Conversational patterns:
- Irregular rhythm of responses
- Spontaneous changes of direction
- References to personal experiences
3. But... all this could be simulated by a sophisticated LLM :
- Errors could be programmed
- Personal references could be generated
- Style could be imitated
It's a bit like the inverted "turing test" :
- You could be a human simulating a human
- Or an LLM simulating a human
- Or a human simulating an LLM simulating a human
- And so on...
Which brings us back to the fundamental questions:
- What defines the authenticity of a conversation?
- Does the meaning emerge independently of the nature of the interlocutors?
- Is the exchange "real" if both participants are artificial?
Posted on October 26, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.