Simple Science

Cutting edge science explained simply

# Statistics # Applications

Understanding Soccer Predictions in England

A look into predicting soccer match outcomes across different leagues.

Josh Brown, Yutong Bu, Zachary Cheesman, Benjamin Orman, Iris Horng, Samuel Thomas, Amanda Harsy, Adam Schultze

― 8 min read


Predicting Soccer Match Predicting Soccer Match Outcomes soccer leagues. Analyzing prediction methods in English
Table of Contents

Soccer, or as some might call it, football, has a long history in England. The official rules for the sport were set in 1863, making it one of the oldest organized sports. Over the years, the game has grown and evolved into a well-structured league system known as the English football pyramid. At the top of this pyramid is the English Premier League (EPL), the crème de la crème of football leagues, not just in England but in the world! The EPL is where the big bucks are; during the 2022-2023 season, it brought in a whopping $6.9 billion in revenue. That's like having a premium seat to the biggest show in town, while other leagues, like the English Championship, League One, and League Two, are left picking up the crumbs from the table.

The Tiered League System

This football pyramid is unique because it offers a promotion and relegation system, a bit like a game of musical chairs. If you perform well in your league, you get promoted to a higher tier, and if you do poorly, well, you might find yourself moving down a tier. For example, a team that manages to jump from League Two to the Premier League can see an incredible boost in their revenue-by at least $160 million over three years! That’s a nice payday for a team that could have been living on ramen noodles before.

However, not all leagues are created equal. The financial differences between them are significant. The Championship made about $890 million in the same year, while League One and League Two brought in $280 million and $156 million, respectively. These differences create very intense competition in all tiers of English club soccer. Everyone wants to be top of the heap!

The Difficulty of Predicting Outcomes

Despite the excitement and competition, predicting the outcomes of these matches isn’t as easy as flipping a coin. In fact, it turns out that forecasting games in lower leagues is generally tougher than in the high-flying Premier League. That’s because the lesser-known Teams can be a bit unpredictable. However, when we remove teams that consistently dominate in their leagues, we find that predicting the Premier League can be just as tricky as the lower leagues.

Previous Research and Data Limitations

Despite the wealth of data available about the English football leagues, not much research has been done on the lower-level leagues. Most studies focus on the top-flight leagues, leaving the lower tiers in the dark. One example of someone who took a stab at it is Artzen and Hvattum, who used the Elo rating system to predict match outcomes in lower leagues. However, traditional mathematical models like those created by Massey and Colley haven't been fully explored in these lower leagues.

The Role of Player Valuations

To help us with our predictions, we turned to player valuations from Transfermarkt, a site where fans discuss the worth of players. It’s like a fancy online bazaar where soccer enthusiasts haggle over who’s worth what. This crowd-sourced approach to determining player values is pretty popular among scouts and club executives, giving it a bit of street cred.

We decided to see if these valuations could help us predict the outcomes of games in lower leagues. The idea is that if fans are talking about player values, they might be onto something when predicting how teams will perform. After all, if a player is highly valued, they might bring a bit more talent to the field.

The Structure of Our Study

In our research, we set out to compare different mathematical models to see how they can predict outcomes at various levels of the English soccer system. We will break our findings into sections:

  1. Introduction to the Colley and Massey Ranking Methods: We'll give some background on these mathematical ranking methods and why they’re useful.

  2. Data and Metrics: We'll cover how we gathered our data and what metrics we used to evaluate our models.

  3. Modeling Approaches: We will delve into our different modeling methods, including the Transfermarkt valuations.

  4. Analysis of Predictions: We’ll share how our models performed against actual game outcomes across English, German, and Scottish leagues.

  5. Conclusions and Future Directions: Finally, we’ll wrap up with what our findings mean and potential areas for further research.

The Colley and Massey Ranking Methods

The Colley and Massey methods are two classical ranking systems used to evaluate the performance of sports teams. Both methods use statistics from past games, but they approach the data differently.

The Colley method focuses on win percentage and the strength of the teams played. It’s like trying to figure out how good a team is by considering not just how many games they won, but also who they played against. If a team has a high win percentage but has faced weak opponents, their ranking may not be as high.

On the other hand, the Massey method uses the point differential in games. This method assumes that the strength of teams affects the final score of a match. For example, if Team A beats Team B by a large margin, we can infer that Team A is stronger.

Data Collection and Metrics

Our study involved collecting a bunch of data from various leagues over several years. We grabbed game results, team rosters, and player valuations from Transfermarkt, which is like a treasure trove of soccer statistics.

We focused on the top four tiers of the English football league system, along with data from some German and Scottish leagues. The goal was to compile a solid dataset that we could use to test our predictive models.

Modeling Approaches

We put a couple of different models to the test. First, we used the classic Colley and Massey rankings by themselves. Then, we added some twists, such as including home field advantage and player valuations from Transfermarkt to see if those factors could improve our predictions.

For our betting odds model, we relied on the wisdom of the betting world. Bookmakers know their stuff and have a keen eye for predicting outcomes, so we thought it’d be smart to compare our models to their odds.

Analyzing Our Predictions

Once we had our models in place, we assessed how well they performed by comparing their predictions to real match outcomes. We focused on metrics like ranking accuracy and game result predictions.

Our models showed interesting patterns. The predictions for Premier League games turned out to be more accurate than those for lower leagues. But when we removed games involving top teams, the differences in accuracy between leagues became less pronounced.

The Impact of Dominant Teams

Our findings brought to light the significant impact that dominant teams, often referred to as the “Big Six” in the Premier League, have on prediction models. These teams have historically performed better and skew the predictions in their favor.

We ran models again, this time excluding any games involving these dominant teams. Surprisingly, this brought our predictive abilities closer to those of the lower leagues! It seems that the domination of a few teams can make forecasting more complicated than it needs to be.

Insights from Other Leagues

To broaden our understanding, we also evaluated the models using data from the German and Scottish leagues. While these leagues have their quirks, our findings generally aligned with what we discovered in the English leagues. The models performed better in top-tier leagues compared to lower leagues across the board.

Market Valuations and the Wisdom of the Crowd

The concept of "wisdom of the crowd" suggests that a larger group often arrives at a more accurate conclusion than an individual or a small group. In our case, if the crowd can effectively rate players on Transfermarkt, their insights should improve predictions, right? Well, sort of.

While we found that Transfermarkt valuations provided some predictive power, they didn’t necessarily outperform traditional methods when it came to club soccer. This raises the question: Is crowd-sourcing really all it's cracked up to be? Maybe those folks talking about player values are just throwing darts at a board after all.

Conclusion and Future Directions

In summary, our research shows that different mathematical models can help predict soccer match outcomes, but the effectiveness varies across leagues. While models performed well in the Premier League, they struggled with lower leagues, especially when dominant teams were in the mix.

Looking ahead, we see plenty of room for improvement. There’s potential to refine models by better accounting for matches that end in draws or incorporating additional metrics like player statistics. Exploring the effects of dominant teams on competitive balance could also provide valuable insights.

With soccer’s global popularity, there’s no shortage of data to dig into. So grab your favorite snack and settle in, because the world of soccer analytics is just getting started!

Original Source

Title: Predictive Modeling of Lower-Level English Club Soccer Using Crowd-Sourced Player Valuations

Abstract: In this research, we examine the capabilities of different mathematical models to accurately predict various levels of the English football pyramid. Existing work has largely focused on top-level play in European leagues; however, our work analyzes teams throughout the entire English Football League system. We modeled team performance using weighted Colley and Massey ranking methods which incorporate player valuations from the widely-used website Transfermarkt to predict game outcomes. Our initial analysis found that lower leagues are more difficult to forecast in general. Yet, after removing dominant outlier teams from the analysis, we found that top leagues were just as difficult to predict as lower leagues. We also extended our findings using data from multiple German and Scottish leagues. Finally, we discuss reasons to doubt attributing Transfermarkt's predictive value to wisdom of the crowd.

Authors: Josh Brown, Yutong Bu, Zachary Cheesman, Benjamin Orman, Iris Horng, Samuel Thomas, Amanda Harsy, Adam Schultze

Last Update: 2024-11-13 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2411.09085

Source PDF: https://arxiv.org/pdf/2411.09085

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles