2025-05-25

Consciousness part 4

 

In terms of consciousness, memory is generally talked about in two ways (at least in science fiction):

1. Our memory of others; and 
2. Our own memories.

Let's talk about our memories of others.  It’s common to mention how someone lives on in their work products and in the memories of those around them.

In Westworld, Dolores says, “You live as long as the last person to remember you.”

On the other hand, Woody Allen said:

“I don't want to achieve immortality through my work; I want to achieve immortality through not dying. I don't want to live on in the hearts of my countrymen; I want to live on in my apartment.”

Scott Adams, who announced he has terminal cancer, and just has a few months to live, absent some breakthrough treatment, has put some time into exploring creating an AI of himself.  He decided it might have an uncanny valley oddness to it so instead he was going to treat it like an offspring.

While very few people have the public body of work that Scott Adams has, I think a "memorial AI" produced from home movies or other references will be as common as a photograph of a loved one.

I was talking to a buddy, Jared C, about how AI doesn’t learn the way people do, because AI takes so much repetition.  I suggested that the likely future – which is already happening to a certain extent and in the two months since that conversation has grown by leaps and bounds – is that foundation models, as they are called, will be used as a basis for smaller, specialized AIs, which can learn quickly, because the foundation model has encoded the basic wiring, and the specialized AI will ‘just’ be a big context window.  (The context window is the number of tokens the model remembers from talking with you.)

Jared said, and I thought this was quite profound, that maybe a million years of evolution had created the human foundation model, and that’s why people can learn quickly.  A lot of it is pre-wired and what we call our experience is a big, self-pruning context window.

There is some evidence that LLMs based on different human languages (e.g., English, French) tend to encode similar patterns.  In fact, the original "Transformer" model was for translating languages.  I believe LLMs encode knowledge because language encodes knowledge.  It trivializes the achievement to say an LLM ‘just’ predicts the next token.  Sure, but based on a 12,000 dimensional vector space of encoded and related tokens.

It's impossible for a human to visualize a 12,000 axis vector space.  In fact, after about five dimensions it gets really difficult.  And the vector space has just grown and grown.  I asked Grok, “Please make a table showing how the vector size has grown over time.

+------+------------------+------------+---------------------------------+
| Year | Model            | Vector Size| Notes                           |
+------+------------------+------------+---------------------------------+
| 2013 | Word2Vec         |        300 | Early word embeddings           |
| 2017 | Transformer      |        512 | Base model from Vaswani et al.  |
| 2018 | BERT-Base        |        768 | Google’s BERT, base version     |
| 2019 | GPT-2            |       1024 | Est. based on model scale       |
| 2020 | GPT-3            |       2048 | Est. based on parameter growth  |
| 2022 | LLaMA            |       4096 | Est. for Meta’s efficient model |
| 2023 | GPT-4            |       8192 | Est. based on scaling trends    |
| 2024 | Grok 3           |      12288 | Est. for xAI’s advanced model   |
+------+------------------+------------+---------------------------------+

As you can see, the dimensions of the vector space have generally grown, but may have topped out in terms of a useful "sweet space", between 8,000 and 12,000, at least for most applications.

As foundation models have grown in their ability to categorize knowledge into these massive vector spaces, so too has the size of the context window.  I mentioned before that the context window is the text you put into the model - your questions and/or observations.  But the designer of the AI also puts text into the context window - rules of behavior.  The context window can come from anywhere.  For Perplexity, and other models that are adding search capabilities, it also comes from the results of a web search. 

I asked Grok, “Produce for me, please, a historical table of LLM content window sizes.”

Below is a historical table of context window sizes for notable large language models (LLMs), based on available data up to May 2025. The context window, measured in tokens, represents the maximum amount of text an LLM can process at once, influencing its ability to handle long conversations, documents, or complex tasks. The table is organized chronologically by model release or significant update, with approximate release dates where possible.


+------+------------------+------------+---------------------------------+
| Year | Model            | Ctx Window | Notes                           |
+------+------------------+------------+---------------------------------+
| 2017 | Transformer      |        512 | Base model, Vaswani et al.      |
| 2018 | ELMo             |        512 | Est. for bidirectional LSTM     |
| 2018 | BERT-Base        |        512 | Google’s BERT, base version     |
| 2019 | GPT-2            |       1024 | OpenAI’s early generative model |
| 2020 | GPT-3            |       2048 | Initial release, ~1.5k words    |
| 2022 | LLaMA 2          |       4096 | Meta’s efficient model          |
| 2023 | GPT-4            |      32768 | 32k version, ~24k words         |
| 2024 | Gemini 1.5       |    1000000 | Google’s 1M token model         |
+------+------------------+------------+---------------------------------+

It’s several months (six or more) since Scott Adams looked into encoding his vast library of publications (books and podcasts) into a “Memory AI” of himself.  I think it’s probably doable now and it will just get easier, because of larger vector spaces and larger context windows.

NotebookLM from Google has a context window big enough to upload my book, Nano-Plasm (free copy at https://www.above-the-garage.com/nano-plasm/Nano-Plasm_v1_1_3_2008-12-31.pdf).  I uploaded the PDF version which it handily decoded.  It summarized the book in less than a second.  Yay for large context windows!  [Update:  NotebookLM identified some typos in the book - actually a lot of them!  But each one it identified that I checked was actually fine.  I think the PDF parser is busted!  Each typo it thought it saw was a letter replaced with a space.  I'll try again later with a .txt file.  You just can't trust LLMs.]

Current LLMs can store enough information to recreate how others remember us.  Dolores in Westworld says,

"If you could see the code, the way it was written, you’d see the beauty of it. It’s elegant, simple… a single line of code can describe everything that person was. Everything they did. For you humans, that’s like… the equivalent of reading their life story in about 30 pages. Maybe less."

In the next article in this series, I will discuss our own memories.


(See "What model" blog post to see how much I knew about foundation models.)

No comments:

Post a Comment