Language Models and Brain Activity: A Study
Investigating connections between language models and brain responses during story listening.
Eunji Kim, Sriya Mantena, Weiwei Yang, Chandan Singh, Sungroh Yoon, Jianfeng Gao
― 6 min read
Table of Contents
We are training two different language models. One uses the GPT-2 tokenizer, and the other uses LLaMA-2. The GPT-2 version has four transformer layers, while the LLaMA-2 version has three. Think of these models like two different cars, both built for the same road but with slightly different engines.
Relative positioning is important when comparing words, so we use something called Relative Positional Encoding. It allows the model to keep track of where each word is in a sentence. The GPT-2 version has a limit of 32 positions, while the LLaMA-2 version can handle 64. It's like having a bigger parking lot for more cars. The vocabulary for both models comes from their respective predecessors, ensuring everything fits nicely.
Creating Similarity Pairs with Language Models
To train these models, we use LLaMA-2 as a mentor. We gather lots of text from different sources to feed into each model, depending on the tokenizer being used. During training, we randomly pick sequences of 32 or 64 words, with a batch size of 128 or 256. This means we are looking at a massive number of word possibilities in each round of training.
We then create pairs of words that are similar based on certain calculations. Think of similarity pairs as pairs of friends who hang out together. We compare how often they are found together in the training material. The models learn to predict the next word based on what they've seen so far. They use a combination of different loss functions to train, meaning they aim to get closer to the right predictions over time. This training continues for quite a while on some high-powered GPUs, which are like fancy calculators for this kind of work.
Finding the Right Threshold for Estimations
Once we have our models, we need to set a threshold for effective predictions. This threshold helps determine when the model is doing well. To find the best number for this threshold, we tried different settings using a training set with 100 million tokens. It’s like testing out various recipes to find the tastiest one.
We looked at six datasets to see how different settings affected model performance. For each dataset, we used it for testing while the others helped in building the main model. We then compared how well the models did when the effective threshold was set to different values. We found that the GPT-2 tokenizer worked best when set to 8, while the LLaMA-2 tokenizer performed better at 9.
Comparing Next-Token Accuracy
In our evaluations, we used various datasets as a reference. For some datasets, we built our own data references, while for others, we used publicly available models. We conducted tests to check how well the models performed at predicting the next word in a sequence.
When comparing the models, we found that while one might take longer to generate responses, it often produced better outputs. This is like waiting longer for a delicious meal at a restaurant instead of a quick snack. The longer wait may lead to a more satisfying experience.
We also looked at examples where the models could match words exactly and where they had to rely on fuzzy matches. This is like trying to recognize a friend in a crowd-if you can’t see them clearly, you might still get a sense of who they are based on their clothing or hairstyle.
Insights from fMRI Data
We also looked at Brain Activity using fMRI, a method that helps see how the brain reacts while people listen to stories. We collected data from three people while they enjoyed some podcasts. There was no need for them to respond; they just listened.
Over several scanning sessions, subjects heard about 20 hours of unique stories. Each session provided a lot of data points we could analyze. We did some fancy measuring to see how well the brain responded to the stories and created a model predicting brain activity based on the words listened to.
To analyze the data, we sorted out noise and made sure everything was aligned properly. We carefully removed parts of the recordings that might confuse our conclusions. The goal here was to see if understanding language could be linked to specific brain functions.
Fuzzy Matching in Brain Responses
In our study of brain data, we created a fuzzy matching model. This model helps in figuring out how closely words relate to one another, even if they are not exact matches. We used some smart math to compare how likely the next word is based on its similarity to the previous ones.
By smoothing out our data to fit the brain’s timing, we could make more accurate predictions of brain responses that correspond to the words being heard. This helped show how different words could trigger similar brain activity, even if they weren't the same.
Comparing Prediction Performance
Next, we tested how well the fuzzy matching model performed against the exact matching model. Despite our efforts, the fuzzy induction model didn’t surpass the exact matching model by much. This could be because the brain data is noisy and not always easy to interpret.
Think of it this way: if you’re listening to a song in a crowded room, you might hear the tune but not catch every word. The fuzzy model is like that-it can pick up on the general vibe but may miss the fine details. The results showed that while similar words could activate the same brain areas, the differences were often subtle.
Real-World Applications
Understanding language and brain connections may help in different fields. For instance, it could assist in improving teaching methods, enlightening how to assist people with language difficulties, or even contributing to artificial intelligence that mimics human understanding in more precise ways.
In summary, as we develop these models and explore the brain's responses, it becomes clearer how language works at various levels-from the algorithms that drive machine learning to the neural circuits in our brains. It's an exciting field, filled with possibilities, and while the learning process can be complex, it can also be quite entertaining!
Title: Interpretable Language Modeling via Induction-head Ngram Models
Abstract: Recent large language models (LLMs) have excelled across a wide range of tasks, but their use in high-stakes and compute-limited settings has intensified the demand for interpretability and efficiency. We address this need by proposing Induction-head ngram models (Induction-Gram), a method that builds an efficient, interpretable LM by bolstering modern ngram models with a hand-engineered "induction head". This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions. This process enables Induction-Gram to provide ngram-level grounding for each generated token. Moreover, experiments show that this simple method significantly improves next-word prediction over baseline interpretable models (up to 26%p) and can be used to speed up LLM inference for large models through speculative decoding. We further study Induction-Gram in a natural-language neuroscience setting, where the goal is to predict the next fMRI response in a sequence. It again provides a significant improvement over interpretable models (20% relative increase in the correlation of predicted fMRI responses), potentially enabling deeper scientific investigation of language selectivity in the brain. The code is available at https://github.com/ejkim47/induction-gram.
Authors: Eunji Kim, Sriya Mantena, Weiwei Yang, Chandan Singh, Sungroh Yoon, Jianfeng Gao
Last Update: 2024-10-31 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2411.00066
Source PDF: https://arxiv.org/pdf/2411.00066
Licence: https://creativecommons.org/licenses/by-nc-sa/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://huggingface.co/datasets/monology/pile-uncopyrighted
- https://github.com/karpathy/minGPT
- https://infini-gram.io/api_doc.html
- https://infini-gram.io/pkg_doc.html
- https://github.com/AlexWan0/infini-gram/tree/main
- https://github.com/ejkim47/induction-gram
- https://babylm.github.io/
- https://huggingface.co/TinyLLaMA/TinyLLaMA-1.1B-intermediate-step-1431k-3T
- https://github.com/OpenNeuroDatasets/ds003020