Simple Science

Cutting edge science explained simply

# Computer Science# Computation and Language

Analyzing Research on Sustainable Development Goals

A system to analyze literature on the SDGs and their trends.

― 7 min read


SDGs Research AnalysisSDGs Research AnalysisSystemDevelopment Goals literature.A system for studying Sustainable
Table of Contents

The world has its fair share of issues that make life a bit harder for everyone. In 2015, the United Nations came up with a list called the Sustainable Development Goals (SDGs) to help tackle these problems by 2030. These goals cover a range of important topics like gender equality, education, poverty, health, and climate change. They all aim to create a better and sustainable future for everyone.

Making Sense of Research on SDGs

With so much research being done, it can be overwhelming to sort through it all. That's where natural language processing (NLP) comes in. By using smart techniques, we can sift through Academic Papers to figure out what people are saying about the SDGs. We built a system that does just that. It fetches research papers, identifies common topics in them, and provides insights into how attitudes toward the SDGs have shifted over time.

How the System Works

  1. Fetching Data: We retrieve information from the Scopus database, which is a treasure trove of academic papers. We focus on five groups of SDGs to make our work more manageable.

  2. Finding Topics: Once we have the data, we use a method called Topic Modeling. This fancy term simply means we look for patterns in the text to determine what main topics people talk about.

  3. Exploring Topics: We allow users to explore these topics through easy keyword searches. Users can also see how the frequency of these topics changes over time.

The Heavy Lifting: Topic Modeling

We employ a tool called BERTopic, which is like a superhero for topic modeling. This tool helps us look at thousands of papers and identify hundreds of topics. We’ve made some improvements to this tool, including better ways to represent the text so it makes more sense. We even created a system to find the best settings to use with our data, making our work more efficient.

Why Use Large Language Models?

In simple terms, we use something powerful for our text analysis-large language models (LLMs). These are advanced tools trained on tons of data. They help us create better representations of the content in academic abstracts, which makes our insights more meaningful.

Visualizing the Results

Once we've processed everything, we present the findings through interactive dashboards. Users can see how different topics have developed over the years and explore keywords related to those topics. It’s like having a time machine for academic research!

Understanding the Sustainable Development Goals

The SDGs are a list of 17 goals that target pressing global issues. These goals are not just random ideas; they are carefully designed to work together to improve life on Earth. We have grouped these goals into five main areas to make our analysis clearer:

  1. Basic Human Needs and Well-being
  2. Environmental Sustainability
  3. Economic Development and Employment
  4. Equality and Social Inclusion
  5. Global Partnerships and Peace

By categorizing them, we can more easily identify trends and topics in the literature.

Gathering the Data

We access the Scopus database because it’s one of the largest sources of academic papers. When we pull this data, we specifically look for English abstracts that cover our 5 areas of interest. We clean the data to ensure quality, which means we remove duplicates and only keep relevant information.

The Cleaning Process

When we gather all this information, it’s important to sift through it carefully. We check for missing details like titles and publication dates, and we make sure there’s no duplicate content. This process ensures we have high-quality data to work with.

The TETYS Pipeline

Our system, which we’ve creatively named TETYS (Topics Evolution That You See), is made up of two parts:

  1. Building the Topic Model: This is where we create a solid understanding of the topics based on the research papers.

  2. Exploring and Visualizing: This part lets users interact with the findings. They can see word clouds, search for specific keywords, and even compare the changes in topics over time.

Topic Modeling Explained

At its core, topic modeling is about grouping documents that discuss similar themes. It’s like putting together a collection of books on the same subject. We use a process that involves this sequence:

  1. Convert documents into data: We turn the text of the abstracts into a format we can analyze.
  2. Reduce complexity: We simplify the data so it’s manageable.
  3. Group similar documents: We cluster the data based on similarities.
  4. Tokenization: We break down the text into keywords.
  5. Identify important words: We keep track of which words are most significant in the context of each topic.

Using the Dashboard

With the TETYS dashboard, users can select any of the five macro-areas we mentioned earlier. They can search for keywords or simply view trending topics. The dashboard provides different views, including individual topics and comparisons between them.

The Topic Comparison Feature

Users can pick multiple topics and see how they stack up against each other in terms of how often they appear in the literature over time. This feature allows for a more dynamic investigation into trends.

Results and Findings

As we ran our analyses, we identified a hefty number of topics across the five macro-areas:

  • Basic Human Needs and Well-being: 550 topics
  • Environmental Sustainability: 856 topics
  • Economic Development and Employment: 181 topics
  • Equality and Social Inclusion: 136 topics
  • Global Partnerships and Peace: 167 topics

The number of topics correlates with the amount of research available in each area. For instance, the first two areas had a lot of research, resulting in many identified topics.

Insights from Each Macro-Area

With our analysis, we were able to connect certain topics to the specific SDGs. For example:

  • In Basic Human Needs, topics related to clean water and health emerged prominently.
  • In Environmental Sustainability, discussions about renewable energy and clean transportation were highlighted.
  • For Economic Development, research about job growth and financial stability was significant.
  • Equality and Social Inclusion focused on issues of gender-based violence and reduced inequality.
  • Finally, Global Partnerships featured diverse topics, showing the multitude of ways partnerships can be approached.

Quality Check: Evaluating the Results

Every model needs to be checked for quality. For our topic modeling results, we compared our current approach to a previous one, looking for improvements in how well each method identified topics.

Manual Evaluation

We performed manual checks on a sample of abstracts to see if our new configurations were actually better. With two trained evaluators assessing the results, we looked at aspects like precision and recall:

  • Precision: How many assigned topics were correct?
  • Recall: How many correct topics were identified?

Our new model scored better in these evaluations, meaning it was generally better at identifying topics accurately.

The Power of Visualization

Visuals can help make sense of complex data. Our dashboard uses different forms of visualization to present findings about the trends in the literature. Word clouds, topic frequency graphs, and more allow users to grasp the information quickly.

User-Friendly Exploration

Our dashboard is designed for easy navigation. Users can explore topics by selecting a macro-area, inputting keywords, or looking at trending topics. The information is presented in a clear and informative way, making it accessible to anyone interested.

Conclusion

In summary, the TETYS system allows for a comprehensive analysis of research literature related to the Sustainable Development Goals. By utilizing advanced tools and methods, we can significantly identify and explore trends over time.

This setup not only enhances our understanding of the literature but also makes it easy for users to engage with the data in a meaningful way. Whether researchers, students, or professionals, everyone can benefit from these insights as we collectively work toward a better future.

And remember, if saving the world was easy, we’d all be doing it by now! So let’s keep exploring, analyzing, and learning together!

Original Source

Title: Capturing research literature attitude towards Sustainable Development Goals: an LLM-based topic modeling approach

Abstract: The world is facing a multitude of challenges that hinder the development of human civilization and the well-being of humanity on the planet. The Sustainable Development Goals (SDGs) were formulated by the United Nations in 2015 to address these global challenges by 2030. Natural language processing techniques can help uncover discussions on SDGs within research literature. We propose a completely automated pipeline to 1) fetch content from the Scopus database and prepare datasets dedicated to five groups of SDGs; 2) perform topic modeling, a statistical technique used to identify topics in large collections of textual data; and 3) enable topic exploration through keywords-based search and topic frequency time series extraction. For topic modeling, we leverage the stack of BERTopic scaled up to be applied on large corpora of textual documents (we find hundreds of topics on hundreds of thousands of documents), introducing i) a novel LLM-based embeddings computation for representing scientific abstracts in the continuous space and ii) a hyperparameter optimizer to efficiently find the best configuration for any new big datasets. We additionally produce the visualization of results on interactive dashboards reporting topics' temporal evolution. Results are made inspectable and explorable, contributing to the interpretability of the topic modeling process. Our proposed LLM-based topic modeling pipeline for big-text datasets allows users to capture insights on the evolution of the attitude toward SDGs within scientific abstracts in the 2006-2023 time span. All the results are reproducible by using our system; the workflow can be generalized to be applied at any point in time to any big corpus of textual documents.

Authors: Francesco Invernici, Francesca Curati, Jelena Jakimov, Amirhossein Samavi, Anna Bernasconi

Last Update: Nov 11, 2024

Language: English

Source URL: https://arxiv.org/abs/2411.02943

Source PDF: https://arxiv.org/pdf/2411.02943

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles