Sci Simple

New Science Research Articles Everyday

# Computer Science # Computation and Language # Machine Learning

Public Transport Sentiment in Sub-Saharan Africa

Analyzing commuter experiences across Kenya, Tanzania, and South Africa.

Rozina L. Myoya, Vukosi Marivate, Idris Abdulmumin

― 8 min read


Transport Sentiment in Transport Sentiment in Africa transit. Examining commuter thoughts on public
Table of Contents

Public transport plays a crucial role in the daily lives of millions around the world. In Sub-Saharan Africa, bus systems, railways, and mini-bus taxis are vital for commuters. However, these systems often receive less focus compared to other sectors like healthcare or education, leading to challenges in service quality and user experience. Understanding what commuters think about public transport can help improve these systems, but how do we gather and analyze this information efficiently?

With the rise of Social Media, people are more vocal about their experiences. Platforms like Twitter (now X) have become great avenues for commuters to share their thoughts and opinions. This provides a rich source of data that can be used to gauge public sentiment. So, let's take a ride into the world of public transport user sentiment, particularly in Kenya, Tanzania, and South Africa!

The Need for User Sentiment Analysis

Why should we care about what users think of public transport? Well, understanding commuter sentiment can lead to better services and improved user experiences. With many people relying on public transportation, it’s essential for transit authorities to know where they are getting it right and where they are falling short.

For instance, if a lot of commuters are voicing concerns over safety, it’s a clear sign that something needs to be done. In contrast, if there's praise for a new bus service, it could be worth expanding that service. Collecting data from social media not only provides real-time feedback but is also cost-effective, as it requires less manpower and resources compared to traditional surveys.

Social Media as a Data Source

Social media platforms are filled with opinions, and they allow users to express their thoughts freely. Commuters frequently share their experiences, whether they’re praising a smooth ride or complaining about lengthy delays. This data can be a goldmine for understanding user sentiment.

However, there are challenges. Tweets can be informal, filled with slang, or even include multiple languages in one post. This is particularly the case in multilingual regions like Sub-Saharan Africa. To make sense of all this, researchers have to use Natural Language Processing (NLP) techniques to sift through the noise.

The Power of NLP

So, what is this NLP thing? Essentially, it’s a branch of artificial intelligence that deals with the interaction between computers and human languages. Using advanced algorithms, NLP can help analyze text data to extract useful insights. In the context of public transport sentiment analysis, NLP can identify whether a tweet expresses a positive, negative, or neutral opinion.

In this study, various pre-trained language models specifically designed for African languages were employed. This means that machines were "trained" on these languages, allowing them to better understand and analyze tweets made in languages like Swahili, isiZulu, and SeTswana.

The Study Layout

Researchers focused on three countries: Kenya, Tanzania, and South Africa. They collected a variety of tweets related to public transport between January 2007 and March 2023 from major cities such as Nairobi, Dar es Salaam, and Johannesburg. By filtering out irrelevant data and focusing on major transport keywords, they aimed to get a clearer picture of commuter sentiments.

The study involved several steps, including data sourcing, processing, analysis, and finally, the application of sentiment analysis models. Each step was crucial in ensuring that the data collected was relevant and insightful.

Data Collection

Data collection involved using specific keywords related to public transport in each country. This included terms that people might use when tweeting about their travel experiences. The researchers specifically focused on metropolitan areas where public transport is a key part of daily commuting.

After gathering a substantial dataset, the researchers turned their attention to processing this information. This step is key, as it ensures that only meaningful data is analyzed, removing anything that doesn’t pertain to the study at hand.

Data Processing

Once the data was collected, it needed to be cleaned and prepared for analysis. This involved several tasks, such as removing punctuation, correcting contractions, and discarding irrelevant words. The aim was to focus on the most critical features of the tweets that could reveal user sentiment.

During this stage, researchers also ran language identification tests to ensure they were analyzing tweets in the right languages. They found that some tweets included a mixture of languages, known as code-switching. This was especially common in a multilingual context, with words from different languages mixed into single tweets.

Feature Extraction

After processing the data, researchers used a technique called feature extraction to determine the underlying themes within the tweets. This process involved creating word embeddings, a way of converting words into numerical representations that machines can understand.

By employing methods like Word2Vec and K-Means clustering, the researchers could group similar words and terms. This helped them identify common themes in the tweets, such as Safety Concerns or fare pricing. These extracted features were essential in understanding commuter sentiment across different countries.

Understanding Commuter Sentiments by Country

Kenya

In the Kenyan dataset, the analysis revealed predominantly negative sentiments. Key themes included safety concerns, particularly in relation to the mini-bus taxi sector (referred to as Matatus). Commuters expressed fears over unpredictable price hikes, potential violent crime incidents, and general safety issues.

The Matatu industry has been under scrutiny for its safety measures, and tweets reflected ongoing frustrations from commuters regarding their experiences. Despite attempts at reform, issues like speeding and harassment of passengers have persisted, leading to a negative view of public transport in Kenya.

Tanzania

In contrast, the sentiment analysis for Tanzania showed mainly positive sentiments. However, this positivity came with a caveat – much of the data was promotional or advertising in nature. Tweets often focused on the new Bus Rapid Transit (BRT) system in Dar es Salaam, which received applause for its efficiency.

The downside was that some tweets were related to fare increases, which nevertheless highlights an important aspect: the relationship between pricing and sentiment. If public transport systems want to maintain positive sentiment, they should be cautious about price changes that could upset commuters.

South Africa

South Africa painted a less rosy picture, with predominantly negative sentiments emerging in the analysis. The main concerns revolved around the deteriorating quality of the public transport system, particularly rail services. Commuters voiced frustrations about vandalism, service failures, and issues related to the government’s transparency in handling public transport challenges.

The negative sentiments reflected broader systemic issues within the transport sector. As commuters expressed their dissatisfaction, it was clear that infrastructure quality and government accountability were top concerns.

Model Testing and Evaluation

For clarity, the research involved using GPUs for model testing, meaning they harnessed the power of advanced graphics processing units to run their analyses effectively. They evaluated several pre-trained models equipped to handle the languages present in the datasets.

Through testing and tweaking, the researchers selected the best-performing models based on their F1 score, a metric that evaluates a model's accuracy. This ensured that the analysis was robust and reliable.

Key Findings

The findings from this study were telling. The commuter sentiments across the three countries exhibited distinct trends. While Kenya and South Africa faced significant challenges regarding safety and infrastructure, Tanzania’s sentiments appeared more favorable, albeit with some concerns regarding pricing.

The primary concerns across the board were related to the cost of public transport, safety dynamics, and the perceived quality of services. By highlighting these issues, the study provides valuable insights for stakeholders in the public transport sector.

Conclusions and Future Directions

The research underscores the potential of employing NLP techniques to analyze user sentiment in public transport. Social media data can offer valuable insights into commuter experiences, allowing transport providers to make informed decisions about improvements.

Moving forward, there is room for enhanced data collection methods and validation processes. Incorporating more datasets that represent the broader commuter experience can lead to more informed insights. Also, employing advanced techniques like aspect-based opinion mining could help delve deeper into specific areas of concern.

Ethical Considerations

While the research utilized social media data, it prioritized user privacy. All identifiable information, such as usernames and location tags, were meticulously removed from the dataset. Protecting the privacy and confidentiality of social media users is crucial, and this study aimed to uphold these ethical standards.

Final Thoughts

In the grand scheme of things, user sentiment in public transport is a vital yet often overlooked aspect that can drive real change. By understanding what commuters think and feel, we can work toward better services, improved safety, and ultimately, a more user-friendly public transport experience. After all, everyone deserves a smooth ride!

Original Source

Title: Analysing Public Transport User Sentiment on Low Resource Multilingual Data

Abstract: Public transport systems in many Sub-Saharan countries often receive less attention compared to other sectors, underscoring the need for innovative solutions to improve the Quality of Service (QoS) and overall user experience. This study explored commuter opinion mining to understand sentiments toward existing public transport systems in Kenya, Tanzania, and South Africa. We used a qualitative research design, analysing data from X (formerly Twitter) to assess sentiments across rail, mini-bus taxis, and buses. By leveraging Multilingual Opinion Mining techniques, we addressed the linguistic diversity and code-switching present in our dataset, thus demonstrating the application of Natural Language Processing (NLP) in extracting insights from under-resourced languages. We employed PLMs such as AfriBERTa, AfroXLMR, AfroLM, and PuoBERTa to conduct the sentiment analysis. The results revealed predominantly negative sentiments in South Africa and Kenya, while the Tanzanian dataset showed mainly positive sentiments due to the advertising nature of the tweets. Furthermore, feature extraction using the Word2Vec model and K-Means clustering illuminated semantic relationships and primary themes found within the different datasets. By prioritising the analysis of user experiences and sentiments, this research paves the way for developing more responsive, user-centered public transport systems in Sub-Saharan countries, contributing to the broader goal of improving urban mobility and sustainability.

Authors: Rozina L. Myoya, Vukosi Marivate, Idris Abdulmumin

Last Update: 2024-12-09 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2412.06951

Source PDF: https://arxiv.org/pdf/2412.06951

Licence: https://creativecommons.org/publicdomain/zero/1.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

Similar Articles