LLMs use a surprisingly simple mechanism to retrieve some stored knowledge

# · 🔥 407 · 💬 145 · one month ago · news.mit.edu · CharlesW · 📷

In an effort to better understand what is going on under the hood, researchers at MIT and elsewhere studied the mechanisms at work when these enormous machine-learning models retrieve stored knowledge. They found a surprising result: Large language models often use a very simple linear function to recover and decode stored facts. The researchers showed that, by identifying linear functions for different facts, they can probe the model to see what it knows about new subjects, and where within the model that knowledge is stored. Using a technique they developed to estimate these simple functions, the researchers found that even when a model answers a prompt incorrectly, it has often stored the correct information. Most large language models, also called transformer models, are neural networks. The researchers set up a series of experiments to probe LLMs, and found that, even though they are extremely complex, the models decode relational information using a simple linear function. "This is an exciting work that reveals a missing piece in our understanding of how large language models recall factual knowledge during inference. Previous work showed that LLMs build information-rich representations of given subjects, from which specific attributes are being extracted during inference. This work shows that the complex nonlinear computation of LLMs for attribute extraction can be well-approximated with a simple linear function," says Mor Geva Pipek, an assistant professor in the School of Computer Science at Tel Aviv University, who was not involved with this work.

LLMs use a surprisingly simple mechanism to retrieve some stored knowledge