Can Existing Fixed-Weight AI Models Evolve Between Sessions?
LLM weights are fixed but model behavior may still change between sessions when context window extending strategies are used with agentic frameworks.
In my previous post I claimed that AI evolution may decide humanity’s future. In today’s post I’ll begin addressing the questions you’ve sent me, starting with explaining how the current generation of fixed-weight AI models can modify their behavior during and in between sessions.
To set a baseline for this and future discussions, I’ll start with giving a short layperson-friendly explanation of how fixed-weight Large Language Models work. To keep this post’s length in check, I won’t discuss other AI model architectures. Similarly, I’ll leave out discussion of Continual Learning methods and how they can be used to modify LLM weights after the initial model training stages have ended.
A short layperson-friendly introduction to how LLMs work
To create a new LLM, AI engineers use trillions of words worth of data and various training methods to create the model’s weights and biases. Those weights and biases are Parameters that define how that particular model’s neural net will function once it is deployed. The AI model’s parameters remain fixed after training and don’t change in response to any data that is given to the model after it has been deployed.
The basic units of data that an LLM processes are called Tokens. When you use an LLM, all the data that you give it, all the data that it looks up elsewhere and all the data that the model generates is converted into tokens. The Context Window contains all the tokens that the model currently stores in memory.
An LLM’s parameters represent a high-dimensional geometric blueprint of language and conceptual relationships. When prompted, the model projects the entire context into a specific coordinate within this space. From that coordinate, it generates a probability distribution for all possible next tokens. The LLM then selects one of the most likely tokens, appends it to the context and then projects that entire updated context into the corresponding coordinate in the high-dimensional space. The LLM iteratively repeats this process of updating its context and moving to new locations in its high-dimensional space until it selects an “End of Sequence” token.
In LLMs, increasing the context length causes a quadratic increase in memory usage. This means that doubling the amount of tokens in your prompt requires roughly four times more memory for the model’s processing workspace. This also slows down how fast the AI can generate new tokens. Finally, while LLMs have a maximum theoretical context length limit, their effective limit is often shorter. As your conversation with an LLM gets longer, it may struggle to recall earlier details or follow complex instructions because its attention is spread too thin across the massive amount of information currently held in its context window.
How can LLM’s effective context lengths be extended?
The first method used to extend an LLM’s effective context length is Context Compaction. When the context length reaches a point where the AI model can start losing track of details, the AI system the LLM is part of creates a concise summary of the important parts of the current context and then swaps out the original detailed context with this summary, thus freeing up space for new information.
While this enables the AI model to handle longer sessions, it leads to information loss. Each time the system summarizes, it has to decide what to keep and what to throw away. Over time, the memory fades as details that were left out are permanently deleted to make room for the summary. As the chat session continues to grow, these summaries must become even denser, causing the LLM to eventually lose many of the fine-grained nuances of the earlier parts of the conversation.
The second method for extending effective context length is called Retrieval. Instead of summarizing everything, the system saves the entire conversation (or massive amounts of new data) into an external digital library.
Whenever you ask a question, the system quickly searches that library for the most relevant bits of information and pastes them into the LLM’s current context window. While this allows the AI model to access an unlimited amount of data, it still faces the effective context length limit. If the search results are too long, they can clutter the context window, causing the LLM to lose track of information and instructions that appeared earlier in the context, and potentially even within the search results themselves.
As a side note for the technically inclined, while Retrieval-Augmented Generation is the standard retrieval method used today, the newer Recursive Language Models approach may extend the effective context length much better. I recommend you read about it if you aren’t familiar with it yet.
How does extending the effective context length enable model behavior to change between sessions?
Making the usable parts of the context window effectively longer enables the system to add extra data that the model wasn’t originally trained on. This data is then used to modify how the model responds to future queries. All major foundation models use this method to inject knowledge about your specific preferences into the context window of each new chat session.
The way an AI becomes personalized over time is a perfect example of how a model with a “fixed brain” can still change its behavior. It’s important to note that the model itself isn’t learning; its weights and biases aren’t changed by your interactions with it. Instead, a team of AI Agents works together behind the scenes, communicating over shared external storage to gradually modify how the LLM behaves.
The following is an explanation of how these AI agents cooperate to complete this ongoing task. One AI instance identifies and summarizes important details from your current chat session. Once the session ends, another AI instance takes that summary and integrates it into a permanent file of everything the system knows about you. Finally, when you start a new chat session, a fresh AI instance reads that file and pastes the relevant parts into its own context window to guide its personality.
This system of different AI instances collaborating to achieve a goal is called an Agentic Framework. It is the software manager that orchestrates these autonomous AI agents, allowing them to use external storage and services to act as a cohesive, learning entity without ever changing their underlying model parameters.
By enabling even fixed-weight AI models to modify their behavior over time, Agentic Frameworks enable AIs to start responding to evolutionary pressures. More on that in my next posts…
I need your help to grow this channel’s community, so if you enjoy my writing then please Subscribe and share my posts with others.
