Simple Science

Cutting edge science explained simply

# Statistics# Computers and Society# Applications

Understanding Cast Vote Records from the 2020 Election

A look into the importance of cast vote records and their impact.

Shiro Kuriwaki, Mason Reece, Samuel Baltz, Aleksandra Conevska, Joseph R. Loffredo, Can Mutlu, Taran Samarth, Kevin E. Acevedo Jetter, Zachary Djanogly Garai, Kate Murray, Shigeo Hirano, Jeffrey B. Lewis, James M. Snyder, Charles H. Stewart

― 7 min read


Examining 2020's VoteExamining 2020's VoteDataelection integrity.Insights into voting trends and
Table of Contents

Ballots are the backbone of any election. They tell us who voted for whom, and they help keep our democracy running smoothly. After the 2020 U.S. Election, some places opened up their electronic vote records, allowing people to see the actual votes cast. These records are known as cast vote records (CVRs). However, not every place released these records in the same way, leading to a bit of chaos for anyone trying to make sense of the results.

We decided to take on the challenge by creating a database of CVRs from the 2020 U.S. general election. We gathered publicly available records from multiple states, standardized them, and made sure they matched up with the official certified Election Results. Our database includes votes for President, Governor, U.S. Senate, and House, plus state legislative races-covering votes from approximately 42.7 million people across 20 states.

This database is a handy resource for anyone interested in understanding how people voted and how elections were run. With this data, we found that in key states, about 1.9% of solid Republicans crossed party lines to vote for Joe Biden, and about 1.2% of loyal Democrats opted for Donald Trump.

What Are Cast Vote Records?

When we think of ballots, we typically think of the physical piece of paper filled out by voters. However, in today's world, we also have electronic records of these votes. These electronic records, known as cast vote records (CVRs), show the choices made by voters on their ballots. The National Institute of Standards and Technology describes CVRs as electronic records that capture how a voter voted.

While CVRs are not the end-all-be-all of election results, they play a crucial role in verifying and tallying votes. Because CVRs break down votes by individual choice, they are great for analysis.

In our study, we gathered data from CVRs that accounted for around 42.7 million voters. You can find this dataset online. Unlike certified election results, which are usually easy to find, CVRs aren't often collected at a state level. Instead, they are typically kept by local election officials.

After the 2020 election, there was a big increase in requests for vote records from local officials. In some states, requests for these records shot up by four to five times between 2020 and 2022. Thanks to the hard work of officials responding to these demands and making records available, researchers now have access to a treasure trove of CVRs.

Why CVRs Matter

The CVR dataset is important for various reasons. For political scientists, economists, and sociologists, this data allows for detailed study of Voting Patterns. Researchers can measure voting behaviors far more accurately than using surveys or aggregate election data since CVRs track individual votes.

For instance, you can see how many people who voted for Trump for President also voted for Republican candidates further down their ballots. Plus, CVRs let researchers analyze the nuance of how people vote in different races, including state legislative contests that are often overlooked.

Ticket Splitting

A phenomenon known as ticket splitting happens when voters choose candidates from different parties for different offices. For example, some voters might pick Biden for President but vote for several Republicans for other offices. Aggregate data can tell you how many votes Trump received, but it can't show the details of ticket-splitting behavior. CVRs give researchers a clear picture of this behavior.

By analyzing this data, researchers can also find out how many voters backed Democratic candidates while also supporting progressive measures in ballots. The CVR data includes information about voters' geographic locations, allowing insights into which districts have more split-ticket voters.

Another interesting area to explore is how ticket splitting relates to the types of media available in an area or even demographic variables like age, race, and income.

Other Ways to Use CVRs

CVRs are useful for more than just understanding ticket splitting. For example, they allow researchers to examine ranked-choice voting, look at instances of roll-off (when voters skip certain races), and analyze support for minor parties. The dataset contains numerous records for individuals who voted for minor party candidates, which is hard to capture in most surveys due to sample size limitations.

On the flip side, election lawyers and officials can use CVRs to study election integrity. Public concern about how votes are counted can lead to doubts about the system. Ballot-level data can help clarify any surprising election results and contribute to discussions about balancing transparency with voters' privacy.

The Bigger Picture

The events of November 2020 shifted how politics operates in the U.S. Election administration became a hot-button issue. Future elections may be scrutinized in similar ways, and the need for clarity and collaboration among election officials, data scientists, and social scientists is more critical than ever.

Out of the 3,143 counties in the U.S., our project focused on 464 counties and three statewide datasets. The CVR data draws from several states, including battleground areas like Wisconsin and Georgia, along with solidly Democratic and Republican states.

Digging into the Data

We took several steps to standardize and validate the CVRs we collected. First, we downloaded the CVR files and ensured they were comparable across different jurisdictions. CVR formats often vary by the type of voting machine used in each county, so we had to clean up the data.

Next, we assigned unique identifiers to each voter based on their CVR. In most cases, each voter received one identifier for all their choices. In a few counties where ballots were double-sided, we had to match pages to ensure each voter's choices were kept together.

Then, we checked the CVRs against official data to identify discrepancies. We limited our analysis to six races: President, Governor, U.S. Senate, U.S. House, State Senate, and State House. Only counties with less than a 1% difference between our CVRs and the official results were released.

The Dataset Breakdown

Our dataset is a hefty one, containing over 166 million rows. Each row indicates a choice made by a voter for a specific contest. We followed naming conventions to maintain consistency. The dataset includes various types of votes, like undervotes (no choice) and overvotes (more choices than allowed).

We also considered privacy issues while preparing the data. We took care to avoid revealing specific voter choices that could be traced back to individual voters. For instance, if too few votes were cast in a specific precinct, we combined data with other precincts to protect anonymity.

Geographic Coverage

Our database includes 20 states. Each included county's data is available for analysis. However, not every county in every state is represented. The dataset provides a snapshot of voter trends and characteristics, with demographic comparisons to the overall U.S. population.

Validating the Data

To ensure our CVRs were accurate, we conducted thorough validations at both the county and precinct levels. We compared vote totals from our CVRs with those reported by official sources.

If we found discrepancies, we looked into the reasons behind them. Sometimes entire precincts were missing from the CVRs, or specific voting methods weren't included. In some instances, counties had to redact certain vote choices to protect the secret ballot.

Learning from the Data

We also provided some guidance on how to work with the dataset. Users can read in the data using programming languages like R or Python. The dataset is structured to make it easy to produce summaries quickly.

For example, researchers can explore party loyalty by looking at how many Republican voters chose Trump, and how many Democrats voted for Biden. By filtering the data based on state and county, users can conduct various analyses freely.

Conclusion

The cast vote records from the 2020 U.S. Election have opened a door for research and analysis in political science and election administration. With this dataset, we can uncover voting patterns and behaviors in ways that were previously difficult, if not impossible.

As we move forward, understanding the implications of these records and the trends they reveal will be vital to ensuring trust in the electoral process. With ongoing interest in election integrity, our work with CVRs provides a strong foundation for future research.

So grab your magnifying glass, and let's dive into the numbers!

Original Source

Title: Cast vote records: A database of ballots from the 2020 U.S. Election

Abstract: Ballots are the core records of elections. Electronic records of actual ballots cast (cast vote records) are available to the public in some jurisdictions. However, they have been released in a variety of formats and have not been independently evaluated. Here we introduce a database of cast vote records from the 2020 U.S. general election. We downloaded publicly available unstandardized cast vote records, standardized them into a multi-state database, and extensively compared their totals to certified election results. Our release includes vote records for President, Governor, U.S. Senate and House, and state upper and lower chambers -- covering 42.7 million voters in 20 states who voted for more than 2,204 candidates. This database serves as a uniquely granular administrative dataset for studying voting behavior and election administration. Using this data, we show that in battleground states, 1.9 percent of solid Republicans (as defined by their congressional and state legislative voting) in our database split their ticket for Joe Biden, while 1.2 percent of solid Democrats split their ticket for Donald Trump.

Authors: Shiro Kuriwaki, Mason Reece, Samuel Baltz, Aleksandra Conevska, Joseph R. Loffredo, Can Mutlu, Taran Samarth, Kevin E. Acevedo Jetter, Zachary Djanogly Garai, Kate Murray, Shigeo Hirano, Jeffrey B. Lewis, James M. Snyder, Charles H. Stewart

Last Update: 2024-10-24 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2411.05020

Source PDF: https://arxiv.org/pdf/2411.05020

Licence: https://creativecommons.org/licenses/by-sa/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

Similar Articles