Good summary of LLM memory issues:
From https://simple.ai/p/the-ai-memory-problemThe next real breakthrough probably isn't going to come from cleverer prompts or better instructions. It's going to come from solving AI's memory problem.
Why AI systems fundamentally have no memory
The current workaround: using context as a memory substitute
How RAG (Retrieval-Augmented Generation) scales this approach
Why real AI memory is the next frontier
A Google paper on this topic!
https://arxiv.org/html/2507.16003v1
And a Google AI summary of it!
Based on the research paper you're viewing, here is an explanation of the concept of "weight updates":
In the context of this paper, a weight update is an implicit modification of a neural network's parameters that occurs at inference time (when the model is being used), rather than during training. The authors argue that this is the mechanism behind in-context learning (ICL) in Large Language Models (LLMs).
Here are the key ideas:
Implicit vs. Explicit Updates: Traditionally, a model's weights are updated explicitly during training using an optimization algorithm like gradient descent. This paper proposes that when an LLM is given a prompt with examples (a context), it performs an implicit weight update on the fly, without undergoing further training.
The Role of Contextual Blocks: The paper introduces the idea of a "contextual block," which generalizes the standard transformer block. This block consists of a contextual layer (like self-attention) followed by a neural network (like an MLP). The contextual layer processes the input prompt and transforms it into a low-rank weight update that is applied to the MLP layer.
Learning Dynamics: When the context is a sequence of tokens, these weight updates happen iteratively as each token is processed. The paper shows that this sequence of updates resembles the dynamics of online gradient descent, where each token effectively acts as a data point that refines the model's weights. As more of the context is processed, these updates become smaller and eventually converge.
In essence, the paper posits that in-context learning is not just about retrieving information learned during pre-training. Instead, the model implicitly fine-tunes its own MLP weights based on the examples provided in the prompt, allowing it to adapt and learn new patterns at inference time.
No comments:
Post a Comment