Predicting Stock Prices with Language Models

Using language models to forecast stock price movements through financial and news data.

Table of Contents

Combining Different Types of Data
Using Language Models
How We Did It
Summarizing News Articles
Creating Prompts for Predictions
Testing Our Predictions
Results and Findings
Why It Matters
Future Directions
Conclusion
Original Source
Reference Links

Predicting stock prices is like guessing whether your cat will knock over a glass of water. It involves looking at a lot of factors, from a company’s financial performance to what people are saying on social media. If you combine financial reports, past stock prices, and recent News Articles, you can create a pretty good picture of what might happen next.

Combining Different Types of Data

To make our stock predictions, we need to pull together information from multiple places. This includes:

Financial Data: This is the nitty-gritty stuff like income statements and balance sheets. Every public company in the U.S. has to share this information quarterly. It’s like showing your report card to your parents.
Historical Price Data: This looks at how a stock has performed in the past. If a company’s stock price has gone up and down like a roller coaster, it might give us clues about what could happen in the future.
News Articles: Investors often pay close attention to news. Social media and news stories are like the gossip of the stock market; they can influence how people feel about a company.

Using Language Models

We decided to use some fancy technology called Large Language Models (LLMs) to help us make predictions. These models are like very smart robots that can read and understand text. They can handle both structured data (like numbers) and unstructured data (like news articles). By feeding the model financial data and relevant news articles, we prompt it to predict whether a stock's price might go up or down.

For our experiments, we used several types of LLMs, including GPT-3, GPT-4, and LLaMA versions. These models have shown they can classify both types of data effectively.

How We Did It

We gathered a bunch of news articles and financial reports for 20 publicly traded companies. These were chosen based on how frequently their stocks are traded. We then created a dataset that included:

5,000 news articles covering these companies from October 2021 to January 2024.
Financial data from the companies’ 10-K reports, which include various financial metrics.

We used a method called "retrieval augmentation" to find the most relevant news articles and attach them to the company’s financial data. This way, when we asked our models to predict stock price movements, they had all the necessary context.

Summarizing News Articles

With so much news out there, we had to figure out how to summarize it. We employed a couple of methods:

Extractive Summarization: This method picks out important sentences from an article. It’s like finding the best quotes from a movie without watching the whole thing.
Abstractive Summarization: This technique generates new sentences that capture the essence of the articles. Imagine someone summarizing a two-hour film into a single sentence.

By using these summarization techniques, we could focus on the parts of the news that most influenced stock prices.

Creating Prompts for Predictions

When we fed information into our LLMs, we needed to be careful about how we structured our prompts. Think of prompts as questions you ask to get an answer. We experimented with different ways of organizing the information we provided, since the order can really change how well the model performs. We included sections about the company, its recent news, its financial data, and then we asked our main question: "Should I invest in this company?"

Testing Our Predictions

To see how well our models did, we prepared a set of sample prompts. We tested our models under different settings-zero-shot, two-shot, and four-shot-to see which one worked best.

Zero-shot setting: We just asked the model the question with no prior examples.
Two-shot setting: We provided two examples.
Four-shot setting: We provided four examples.

Surprisingly, adding more examples didn’t always lead to better accuracy. It was like trying to teach an old dog new tricks-it doesn’t always work!

Results and Findings

Our research showed that different models performed differently. Some models like GPT-4 and LLaMA3 were better at predicting stock price movements. The best results came from models that could balance both types of data-financial numbers and news snippets.

Why It Matters

So why should anyone care about these predictions? Well, knowing if a stock's price might go up or down can help investors make better decisions. If a model can accurately predict these movements, it could save people from making poor investment choices-like buying a stock just before it tumbles down.

Future Directions

We learned that while using large language models in this way is promising, there’s still a lot to improve on. For our next steps, we plan to fine-tune smaller models that combine both textual and numerical data. We’re also interested in changing our approach from simple predictions of whether the stock will go up or down to predicting how much it might change in percentage terms. Stocks are a tricky business, but we’re eager to keep learning!

Conclusion

In the end, predicting stock prices is a complex but exciting challenge. With the right mix of financial data, news articles, and smart technology, we can improve our chances of making accurate predictions. And who knows? Maybe one day, there will be a cat that doesn’t knock over any water glasses!

Predicting Stock Prices with Language Models

Combining Different Types of Data

Using Language Models

How We Did It

Summarizing News Articles

Creating Prompts for Predictions

Testing Our Predictions

Results and Findings

Why It Matters

Future Directions

Conclusion

Reference Links

Referenced Topics

More from authors

Similar Articles

Predicting Stock Prices with Language Models

#Combining Different Types of Data

#Using Language Models

#How We Did It

#Summarizing News Articles

#Creating Prompts for Predictions

#Testing Our Predictions

#Results and Findings

#Why It Matters

#Future Directions

#Conclusion

Reference Links

Referenced Topics

More from authors

Similar Articles

Combining Different Types of Data

Using Language Models

How We Did It

Summarizing News Articles

Creating Prompts for Predictions

Testing Our Predictions

Results and Findings

Why It Matters

Future Directions

Conclusion