Simple Science

Cutting edge science explained simply

# Computer Science# Information Retrieval

Improving Expert Finding in Politics with LDA

A study on using LDA for effective political expert recommendations.

― 11 min read


LDA for PoliticalLDA for PoliticalExpertisepolitical contexts.Using LDA to enhance expert finding in
Table of Contents

In many political organizations, such as parliaments, people often need to find politicians who are knowledgeable about specific Topics. To do this, we first need to create profiles of these politicians, which include their areas of interest. This information can be gathered automatically from their speeches. Since a politician can be an expert in several fields, we can create subprofiles for each area of expertise.

This study introduces a new way to create these profiles using a method called Latent Dirichlet Allocation (LDA). LDA helps identify the main topics discussed in political speeches and organizes related terms into different topic-based subprofiles. To accomplish this, we used fifteen distance and similarity measures to figure out the best number of topics discussed in a speech. It turns out that these measures generally condense down to five strategies: Euclidean, Dice, Sorensen, Cosine, and Overlap. Our tests showed that the accuracy scores from the proposed strategies were usually better than those of standard methods used for expert recommendations, and using an appropriate number of topics was crucial.

The Importance of Expert Finding

The greater context of this work is content-based recommendation systems that suggest items to users based on their textual descriptions and individual preferences. When it comes to recommending people, we are specifically looking for the best individuals to handle certain tasks or issues. In our case, these individuals are politicians who are experts in certain areas.

For example, a Member of Parliament (MP) sitting on the Agriculture Committee should have a deep understanding of various agricultural issues, such as relevant laws, problems, initiatives, subsidies, and types of crops in different locations. The same applies to MPs serving on other committees that focus on health, culture, the economy, education, and more.

When someone faces a specific problem, like excessive heat in classrooms at the end of the school year, or seeks information about rising noise levels during the night in residential areas, the first step is to identify the right person to contact. One approach could be to use general search engines to find lists of politicians, but this can be time-consuming and inefficient since the information is scattered and unreliable. Alternatively, a specialized expert-finding system can store textual information about politicians, allowing users to submit queries and receive a list of relevant MPs. This system can help users easily reach the right politician who can assist them with their issues.

The textual information about each expert includes their interests and areas of expertise, which can be obtained from various sources, such as reports, Documents, and transcriptions of their speeches in parliamentary debates. By analyzing this information, we can learn about the experts based on what they say.

To recommend the right experts, we must represent their areas of expertise in a clear way. The most common way to do this is using terms that describe their interests and expertise. When a candidate has diverse interests, for instance, in health, education, and environment, it may not make sense to combine them all into one profile. This might lead to underrepresentation of certain topics. By separating them into more focused subprofiles, we can provide clearer and more useful representations of their expertise.

Breaking Down Profiles for Better Recommendations

The goal of this paper is to find a method to break down a single, diverse profile-created from all the terms gathered from a politician's speeches-into multiple focused subprofiles. By accurately determining a candidate's interests, we can offer better recommendations.

To achieve this, we will employ LDA to identify topics within the documents associated with politicians. A previous study approached the same problem using clustering techniques instead of topic models.

In this study, we aim to use LDA differently than how it is conventionally combined with expert finding. Most approaches represent documents and profiles using term vectors (bag-of-words). Our approach will utilize a topic model like LDA to shift the representation from terms to topics. We will not only separate documents into subdocuments linked to different topics but will also keep them in the term space rather than converting them into the topic space. The subdocuments belonging to the same topic will then be combined to form the subprofiles. Since this might lead to an overwhelming number of subprofiles for some candidates, especially those with limited terms, we have also created a method to simplify this process by selecting only the most relevant topics.

The focus of our study centers around the effectiveness of LDA in creating expert subprofiles in a political context. The main contributions include:

  1. Investigating how LDA can generate multiple focused term subprofiles for expert finding within a political setting.
  2. Proposing a strategy to divide documents into thematic subdocuments by distributing terms based on the LDA-generated matrices.
  3. Developing a systematic approach to assign an optimal selection of topics to each document based on distance and similarity measures.
  4. Conducting extensive tests comparing our proposals with several baseline models.

Related Work

Expert finding methods aim to connect individuals with specific areas of expertise, and there has been a growing interest in these systems, with many applications, including:

  • Assigning reviewers to submitted papers for conferences or journals.
  • Identifying suitable collaborators for projects.
  • Finding experts in academic environments, social media, organizations, or the broader web.

In political domains, those who have previously tackled expert finding include the authors of this study.

Two fundamental approaches in expert finding are:

  1. Profile-based methods, which build a profile for each expert by combining relevant documents.
  2. Document-based methods, which preserve documents related to an expert as individual entities and retrieve relevant documents based on user queries.

In our case, we will employ a document-based approach, as the documents relate to individual speeches of MPs. While document-based methods generally perform better, some studies have shown mixed results.

In our work, we will focus on topic models, especially LDA, as many existing methods use probabilistic latent semantic analysis (pLSA) in community question answering (CQA) systems. The pLSA model can either represent users based on aggregated topic distributions of their questions or as documents reflecting the questions related to a user.

In document-based models, the probabilities of query terms are commonly estimated using maximum likelihood and Dirichlet smoothing. However, some methods have integrated LDA-learned topics from document collections into user representations, enhancing the expert finding process.

Several other topic models exist, such as the Author-Persona-Topic (APT) model, which can recommend reviewers for submitted papers by representing each author with a distribution over hidden topics reflecting various roles.

The aim of our study is to explore a specialized approach that focuses on creating homogeneous subprofiles from the MPs' speeches.

The Process of Expert Finding Using Speech Analysis

Let us consider a situation where we have a group of potential expert candidates and a collection of documents associated with them. In our case, the candidates will be MPs, and each document is linked to their speeches in parliamentary debates.

Our aim is to break down the diverse profile containing terms from all documents related to an MP into more focused thematic subprofiles. To do this, we will first apply LDA to identify the various topics within the document collection.

When LDA is used on a document collection, it generates two matrices, where:

  • Each entry indicates the probability of a term associated with a topic.
  • Each entry reflects the likelihood of a topic being linked to a document.

Once LDA identifies the topics, the next step is to separate each document into multiple subdocuments based on the different topics discussed.

In this example, if a document addresses two topics, say "Health" and "Education," the terms relevant to health should primarily go into one subdocument, while terms related to education should be in another. However, some terms may relate to multiple topics, which complicates the allocation process.

Our proposed method distributes the occurrences of each term across the subdocuments based on the probabilities derived from LDA. We calculate these probabilities using the relationship between terms, documents, and topics.

After applying the separation process, we will merge the subdocuments linked to the same topics to create the candidates' subprofiles. Although we might generate a high number of subprofiles through this method, we can apply a strategy to reduce the number of subprofiles by selecting only relevant topics linked to each document.

Selecting the Optimal Number of Subdocuments

Selecting the number of topics can significantly influence the outcome of our findings. Therefore, it is important to approach this systematically. To do this, we establish a probability distribution over the topics and determine the best index to select the most relevant topics.

We can utilize various distance and similarity measures to assist us in this task. The primary goal is to find a suitable set of topics that gives us the best performance.

When analyzing different distance and similarity measures, we find several noteworthy metrics, including:

  • The Cosine similarity measure.
  • The Dice coefficient.
  • The Jaccard similarity index.
  • The Euclidean distance.
  • The Overlap coefficient.

In our exploration of distance and similarity measures, we determine that while we have numerous ways to calculate these metrics, we generally arrive at only five different selection strategies.

By applying these strategies to our expert-finding task, we can derive a more accurate number of subprofiles to represent the candidates effectively.

Conducting Experiments

The primary aim of this study is to determine if using LDA to build subprofiles of terms helps improve expert finding in a political context. To validate this, we rely on data derived from the Records of Parliamentary Proceedings. This collection contains speeches from various initiatives discussed in the Andalusian Parliament, including contributions from numerous different MPs.

We divide the documents into training and testing sets. The training set is used to run the LDA and create subprofiles, while the testing set is used to evaluate the system. We repeat this sampling process multiple times to ensure accurate predictions.

To measure the effectiveness of our system, we compute three standard information retrieval metrics: precision, normalized discounted cumulative gain (NDCG) focused on the top ten MPs, and recall based on the total number of relevant MPs.

Analyzing Distribution Strategies

Once we analyze how the intervention terms are distributed among different topics, we can assess how this affects the subprofiles created for each MP. The manner in which we distribute terms can greatly influence the output ranking of MPs, which is essential for effective recommendations.

By examining the size of the subprofiles generated using various distribution strategies, we can observe trends. Specifically, as we increase the number of topics considered, the number of generated subprofiles tends to increase. However, the average terms contained in each subprofile tend to decrease.

This observation aligns with expectations: when we categorize MPs' speeches into more specific topics, we can recognize patterns of specialization. This helps create a clearer understanding of each MP's expertise.

By analyzing the presence of tiny subprofiles-those containing fewer than fifty terms-we identify potential issues with representativeness. A high number of these tiny subprofiles can create challenges when determining the most relevant politicians.

Performance Evaluation

After evaluating the effectiveness of different distribution strategies, we discussed the performance of various models, including term and topic-based baselines as well as deep learning models. Our findings suggest that the term domain tends to yield better results compared to the topic domain.

When conducting tests on our proposed approaches, we found that the distribution strategies generally outperformed the baselines. However, the number of topics chosen plays an important role in determining the overall effectiveness.

Through various tests, it was evident that while there are unique strengths to each distribution strategy, the Sorensen strategy particularly stood out for creating more homogeneous profiles.

Conclusions and Future Directions

This research illustrates how applying LDA to mine terms from speeches positively impacts expert recommendations in a political context. We showed that a well-structured approach using LDA produces valuable topic-based profiles. The different distribution strategies work effectively to create coherent term distributions across topics.

Moving forward, we aim to explore how temporal aspects can influence the construction of these subprofiles. Additionally, we may look into distributing document terms at the paragraph level instead of just at the term level to better capture the essential topics within a speech. Lastly, we are interested in applying these methodologies across various domains beyond politics for further validation.

Original Source

Title: LDA-based Term Profiles for Expert Finding in a Political Setting

Abstract: A common task in many political institutions (i.e. Parliament) is to find politicians who are experts in a particular field. In order to tackle this problem, the first step is to obtain politician profiles which include their interests, and these can be automatically learned from their speeches. As a politician may have various areas of expertise, one alternative is to use a set of subprofiles, each of which covers a different subject. In this study, we propose a novel approach for this task by using latent Dirichlet allocation (LDA) to determine the main underlying topics of each political speech, and to distribute the related terms among the different topic-based subprofiles. With this objective, we propose the use of fifteen distance and similarity measures to automatically determine the optimal number of topics discussed in a document, and to demonstrate that every measure converges into five strategies: Euclidean, Dice, Sorensen, Cosine and Overlap. Our experimental results showed that the scores of the different accuracy metrics of the proposed strategies tended to be higher than those of the baselines for expert recommendation tasks, and that the use of an appropriate number of topics has proved relevant.

Authors: Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete, Luis Redondo-Expósito

Last Update: 2024-01-19 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2401.10617

Source PDF: https://arxiv.org/pdf/2401.10617

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles