Sci Simple

New Science Research Articles Everyday

# Computer Science # Computation and Language # Social and Information Networks

Kenyans Speak Out: Citizen Reports Shape Election Reality

A look at how citizen journalism impacted the 2022 Kenyan General Election.

Roberto Mondini, Neema Kotonya, Robert L. Logan, Elizabeth M Olson, Angela Oduor Lungati, Daniel Duke Odongo, Tim Ombasa, Hemank Lamba, Aoife Cahill, Joel R. Tetreault, Alejandro Jaimes

― 9 min read


Citizen Reports Transform Citizen Reports Transform Kenyan Elections election landscape. How everyday voices reshaped the 2022
Table of Contents

In 2022, Kenyans took to the polls for a general election that witnessed a significant amount of citizen reporting. This reporting was largely made possible through various online platforms where people could share their views and experiences in real time. Imagine a big community notice board where everyone can post their thoughts, complaints, and observations about what's happening around them on election day. This is what Citizen Journalism looks like today!

Citizens reported issues like misconduct, strange vote counts, and even instances of violence. This dataset consists of over 14,000 Reports related to the 2022 Kenyan General Election. These reports were collected from a platform that helped people send in their views using SMS, social media, and other digital means. The beauty of this process is that it gives a voice to the common person, allowing them to report what they see and feel as events unfold.

The Importance of Organizing Data

When a flood of information comes in, it’s essential to organize it effectively. Think of it like trying to make sense of a giant jigsaw puzzle — without sorting the pieces first, it's a bit of a mess! Each report was categorized based on specific issues, and the location of each incident was tagged, so it could be mapped out. This organization is vital for authorities and policy makers, helping them gain insights from this information to promote positive changes in society.

The task of organizing all these reports isn't easy and often requires a lot of manual work. It’s like having a mountain of laundry — it takes time and effort to fold and put everything away neatly. That’s why this dataset is significant; it aims to simplify the process by using technology to assist in categorizing and tagging reports.

Citizen Reporting in Action

The online reporting platforms allowed citizens to report issues as they happened. These reports cover a variety of topics, such as complaints about polling station operations, allegations of fraud, and observations on voter behavior. The nature of citizen journalism makes it a powerful tool for shedding light on the realities of elections, especially in places where traditional media might not have access.

However, not every report is trustworthy. Some might be based on mere opinions or rumors rather than facts. This is why it’s crucial for platforms to verify reports. Without verification, unfiltered content can spread misinformation like wildfire. It’s like passing on a rumor about someone; it can easily spiral out of control (and trust us, no one likes being on the receiving end of a rumor!).

To keep track of how reports affect different communities, platforms also categorized them by topic and location. This means that when someone reads a report, they can see how events unfold in their area, thus staying informed about their community. It’s like having a local news channel that broadcasts live updates from your neighborhood, right on your phone.

The Dataset Overview

The dataset contains 14,169 reports related to the 2022 Kenyan General Elections. These reports were submitted through a system designed specifically for this purpose. Over a period of two months leading up to the election, citizens shared their experiences and observations.

Reports were carefully reviewed by trained volunteers who ensured that the data was accurate and categorized correctly. These volunteers spoke both English and Swahili, which is quite helpful in a country with a rich linguistic diversity! After review, these reports became available to the public, providing valuable insights for journalists, researchers, and citizens alike.

Topic Classification

To make sense of the reports, they were divided into categories based on their topics. Think of topics as the chapters in a book, with each chapter focusing on a different theme. For instance, some reports were about irregularities in voting, while others focused on security issues or administrative tasks at polling stations.

The volunteers assigned topics to each report based on the content. Additionally, specific tags were added to provide even more detail. This is similar to how you might label your boxes when moving; it saves you from opening every single one to find your winter clothes!

Geo-tagging the Reports

Each report also included a geographic tag, meaning the location where the incident occurred was marked. The volunteers did this by looking for mention of places in the reports. In the case where no location was specified, a default point was set in the center of Nairobi. It’s a practical fallback, akin to putting “home” as your location when you lose your way.

This geographic information helps in creating a visual map of where events took place. By plotting these reports on a map, one can easily see which areas experienced issues during the election. This can inform discussions and decisions made by various stakeholders, including governments and NGOs.

Challenges with Data Annotation

As you can imagine, sorting through thousands of reports and organizing them isn’t just a walk in the park. It requires time, attention, and a lot of patience. Manual annotation is indeed labor-intensive, which often leads to delays in making information public.

In the case of the Kenyan election reports, an impressive number of 86,000 reports were received, but they weren’t annotated due to a lack of resources. This shows just how valuable an Automated approach could be to assist in handling large amounts of data.

Report Length and Content

The reports varied in length, with many fitting within a specific character limit due to the platforms used for submissions. This limit is much like how sending a text message has a character cap — it encourages succinctness!

Additionally, since many languages are spoken in Kenya, the dataset captured a mix of languages, including English and Swahili. Some reports even showcased code-switching, where speakers switch between languages within the same conversation. It’s like when you see someone seamlessly blend their favorite dishes into a new, tasty meal!

Geographic Distribution of Reports

When taking a closer look at where these reports came from, it was noted that most originated from Nairobi and its surrounding counties. It’s no surprise that a city bustling with people would generate a lot of reports! In contrast, rural areas had fewer submissions.

This uneven distribution underscores the importance of ensuring that all voices are heard, irrespective of geography. It’s akin to a gossipy town where everyone’s talking loudly, while in quieter areas, whispers carry a different weight.

Trends Over Time

The dataset also allows for analyzing trends over time. By examining when reports came in, researchers can see how public sentiment changed throughout the election phases. For example, before the election, people reported on scandals, while on election day, they focused on results and voter turnout.

These trends help in understanding the electoral landscape and can point to key issues that need addressing. It’s like tracking seasons; knowing when a storm hit can help prepare for the next one!

Evaluating Data Quality

To ensure the quality of the reports, random samples were reviewed by expert annotators to compare their findings with those of the volunteer annotators. This step is crucial to ensure that the information being shared is accurate.

Interestingly, the agreement between volunteers and experts showed some inconsistencies, suggesting that some reports were heavily subjective — like when someone tells you their favorite song is the best ever, and you simply don’t agree! This subjectivity is expected given the volume of reports and highlights the need for automated systems to enhance the accuracy of the data further.

Automating Report Categorization

As with any large dataset, the goal was to explore how language models could help in categorizing and tagging reports efficiently. Using machine learning techniques, the aim was to reduce the manual labor and enhance the speed of processing reports.

This innovative approach can help agencies focus more on understanding the insights drawn from the data rather than just sifting through it. It’s like having a smart assistant that can filter through piles of papers to find just the information you need!

Geotagging Automation

Geotagging involves two key tasks: extracting the mentioned locations from the reports and retrieving the coordinates for these locations. If categorizing reports is one side of the coin, geotagging is the other, completing the picture!

Several methods were explored, including using advanced models that can adapt and recognize locations mentioned in reports. Of course, technology sometimes stumbles. There were instances where the location wasn’t found, highlighting the need for continuous improvement in the systems used.

Results and Findings

The results from the automated categorization and geotagging tasks provide valuable insights into how effective these systems can be. Performance was evaluated based on different metrics, ensuring that both the accuracy and coverage of location tags met the standards expected.

Interestingly, while larger models showed better performance in identifying locations, challenges persisted in pinpointing specific sites or landmarks. This is very much like trying to find your friend in a crowd; sometimes you just need a little more than a simple description!

Understanding the Related Work

Election studies have been a hot topic, especially in the era of social media. Many research efforts have focused on how elections are analyzed through online platforms. Most studies, however, have taken place in the context of the United States or Europe.

This dataset stands out because it centers specifically around the Kenyan elections, contributing a fresh perspective to the dialogue about electoral integrity. It leans heavily into citizen contributions, prioritizing the voices of everyday people in this critical democratic process.

Ethical Considerations

When dealing with data, especially concerning individuals and events, ethical considerations are paramount. The data in this dataset was collected from publicly available sources, ensuring that no confidential information was shared.

Moreover, researchers who wish to access this dataset must adhere to a licensing agreement that prohibits misuse. This measure helps protect the integrity of the data and ensures that it is used for the right purposes.

Conclusion

The dataset of citizen reports on the 2022 Kenyan Election is a powerful resource for understanding public sentiment and issues surrounding elections. With 14,169 reports collected, it reflects a diverse array of opinions and experiences.

As we move forward, the automation of categorization and tagging can greatly enhance how data is processed, allowing for quicker responses to emerging issues.

In the grand scheme of things, citizen reporting and the use of technology can help bolster democracy by ensuring that everyone’s voice is heard – even if it means sorting through a messy laundry basket of opinions and observations! As citizens continue to participate and share their stories, we can only hope that these efforts lead to fairer and more transparent elections in the future.

Original Source

Title: Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

Abstract: Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.

Authors: Roberto Mondini, Neema Kotonya, Robert L. Logan, Elizabeth M Olson, Angela Oduor Lungati, Daniel Duke Odongo, Tim Ombasa, Hemank Lamba, Aoife Cahill, Joel R. Tetreault, Alejandro Jaimes

Last Update: 2024-12-17 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2412.13098

Source PDF: https://arxiv.org/pdf/2412.13098

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles