Advancements in Named Entity Recognition for Biomedical Applications
New methods enhance the identification of key biomedical terms in research.
― 5 min read
Table of Contents
Named Entity Recognition (NER) is a process used in natural language processing to find and classify key pieces of information in text, such as names of people, organizations, and locations. In the biomedical field, NER helps in identifying specific terms related to diseases, genes, and other medical entities from research papers and clinical data.
Different Approaches to NER
There are several methods to perform NER, but three popular ones stand out:
SEQ: This method looks at each word in a sentence individually and assigns it a label that indicates whether it marks the beginning, middle, or end of an entity.
SeqCRF: This method is similar to SEQ but adds a layer that helps in considering the relationships between neighboring words. It ensures that labels assigned to one word take into account the labels of the words next to it.
SpanPred: This approach focuses on segments of text (spans) rather than individual words. It identifies the start and end of an entity by looking at a pair of words that surround the entity.
These three methods have been evaluated on four biomedical NER tasks. The tasks include datasets of different languages and contexts, specifically:
- GENIA: Handles English sentences
- NCBI-Disease: Focuses on disease-related terms in English
- LivingNER: Captures named entities in Spanish
- SocialDisNER: Also works with Spanish tweets.
Performance Analysis
Among the methods tested, SpanPred showed the best results in identifying entities in the LivingNER and SocialDisNER tasks. It improved the performance score, known as F1, by noticeable margins across these datasets. Similarly, the SeqCRF method also performed quite well, particularly in the same Spanish-related tasks. The SEQ method also held its ground but was only slightly less effective.
The ability to combine predictions from different models was also investigated. The findings revealed that a simple voting method, known as majority voting or MajVote, consistently resulted in high accuracy across all four datasets. This method allowed the predictions from different models to work together, producing better results.
Combining Approaches
Two main methods were used to combine the predictions of the models:
Union Method: Here, all predictions made by the models were put together. This method ensured that no correct predictions were lost but could lower overall accuracy since it included all predictions, correct or incorrect.
MajVote Method: This classic approach took predictions that received the most votes from the models. As a result, it tended to return predictions that were more likely accurate, leading to higher precision.
A new combined system, referred to as Meta, was also created to improve the weaknesses found in the Union method. Meta worked by learning from the predictions of the SEQ and SpanPred models, determining whether each prediction was right or wrong. By doing this, it aimed to keep the correct predictions while filtering out the incorrect ones.
How Models Work
Each method starts with a step that transforms sentences into a format that the model can understand. This transformation creates a representation for each word in the sentence based on its context. Special markers in the text are used to help the models focus on the parts of the sentence that contain relevant information.
For the SEQ and SeqCRF methods, each word is examined individually, while SpanPred checks different spans of words. For each model, after identifying the entities, a final step classifies them into specific categories, like disease or person name.
During the evaluation, all predictions were checked against the correct answers to determine the effectiveness of each method. The criteria used to measure this were strict, meaning only exact matches would count.
Findings and Results
During testing, SpanPred generally performed better than the other two models. Particularly in cases where entities overlapped, such as in the LivingNER and GENIA datasets, it proved to be the most effective. However, on clear-cut tasks without overlapping entities, like in SocialDisNER and NCBI-Disease, the results were more balanced among the three methods.
The improvements brought by combining models were clear. Systems developed through the Union method had better recall, meaning they caught more correct predictions, but tended to drop in precision, leading to more incorrect predictions. On the other hand, the MajVote method managed to keep a good balance of high precision and recall, proving more reliable overall.
The Meta approach, designed to enhance the Union method, showed great promise. It effectively increased precision without compromising recall, which is the ideal outcome for any entity recognition task.
Conclusion
The findings demonstrate that while individual models have their strengths and weaknesses, combining different approaches can lead to improved outcomes in biomedical named entity recognition tasks. The use of majority voting and the new Meta model significantly contributes to refining the predictions made by traditional methods.
The ability to accurately identify and classify medical terms is crucial in biomedical research and applications, helping professionals access and utilize information more effectively. Continued advancements in NER methodologies will likely benefit various fields by providing more precise tools for processing vast amounts of textual information.
Title: Comparing and combining some popular NER approaches on Biomedical tasks
Abstract: We compare three simple and popular approaches for NER: 1) SEQ (sequence-labeling with a linear token classifier) 2) SeqCRF (sequence-labeling with Conditional Random Fields), and 3) SpanPred (span-prediction with boundary token embeddings). We compare the approaches on 4 biomedical NER tasks: GENIA, NCBI-Disease, LivingNER (Spanish), and SocialDisNER (Spanish). The SpanPred model demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 1.3 and 0.6 F1 respectively. The SeqCRF model also demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 0.2 F1 and 0.7 respectively. The SEQ model is competitive with the state-of-the-art on the LivingNER dataset. We explore some simple ways of combining the three approaches. We find that majority voting consistently gives high precision and high F1 across all 4 datasets. Lastly, we implement a system that learns to combine the predictions of SEQ and SpanPred, generating systems that consistently give high recall and high F1 across all 4 datasets. On the GENIA dataset, we find that our learned combiner system significantly boosts F1(+1.2) and recall(+2.1) over the systems being combined. We release all the well-documented code necessary to reproduce all systems at https://github.com/flyingmothman/bionlp.
Authors: Harsh Verma, Sabine Bergler, Narjesossadat Tahaei
Last Update: 2023-05-30 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2305.19120
Source PDF: https://arxiv.org/pdf/2305.19120
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://www.latex-project.org/help/documentation/encguide.pdf
- https://github.com/flyingmothman/bionlp
- https://github.com/flyingmothman/bionlp/blob/d61b02593711b43b5d0f00f0c6ed62fb7685adf3/utils/training.py#L13-L20
- https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.Adafactor
- https://temu.bsc.es/livingner/2022/01/28/evaluation/