Simple Science

Cutting edge science explained simply

# Computer Science# Computation and Language# Artificial Intelligence

Advancing Medical Knowledge with Graphs

A system to extract and organize medical data using knowledge graphs.

― 8 min read


Building MedicalBuilding MedicalKnowledge Graphsclinical texts.Extracting valuable insights from
Table of Contents

In recent years, medicine has seen rapid advancements. As new discoveries are made, our understanding of diseases, treatments, and diagnoses continues to change. This is especially important in the medical field, where doctors and researchers strive for better methods to help patients. With the growing amount of information available, organizing and retrieving medical knowledge has become a key challenge.

One effective way to manage this information is through the use of Knowledge Graphs. These graphs represent data in a structured way, making it easier to find and use relevant information in patient care and research. They can help doctors by summarizing important facts from complex medical data, allowing for better decisions regarding treatment.

Objective

The goal of this approach is to create a system that automatically extracts useful information from clinical notes and organizes it into a knowledge graph. This is done by using advanced models that understand and analyze medical texts. By converting unstructured text into a structured format, we aim to help healthcare professionals quickly find the information they need.

Methods

The proposed system uses a model called BERT, which is designed for understanding language. It reads the clinical notes and identifies important entities like medications, dosages, and side effects. Along with BERT, we include a method called Conditional Random Field to improve the accuracy of extracting relationships between these entities.

Knowledge graphs help in visualizing complex information clearly. By representing medical entities and their relationships, we can simplify the retrieval of medical knowledge. This structured format allows healthcare professionals to address intricate inquiries effectively.

In developing the knowledge graph, we use natural language processing techniques to recognize entities and extract their relationships from the clinical notes. Once the data is organized in the graph, it can be used for various analyses, such as answering questions about treatments and side effects.

Results

The framework we built successfully extracts structured information from clinical notes. Our experiments show that it achieves a high accuracy rate of 90.7% for recognizing medical entities and 88% for determining relationships between them. This data was derived from a real-world set of clinical notes from patients.

This knowledge graph can help healthcare professionals by allowing easy access to organized data about patients, medications, and their interactions. It also allows for monitoring the relationships between drugs and clinical notes, ensuring better management of patient care.

Furthermore, many existing solutions only focus on certain parts of creating knowledge graphs and might not consider the overall process, including the analysis of the graph. Our approach aims to cover the entire cycle, from constructing the graph to analyzing it thoroughly.

Knowledge Graph and Applications

A knowledge graph serves as a machine-readable representation of knowledge within a specific domain. It is made up of nodes, edges, and labeled relationships that describe entities-such as patients, medicines, and diseases-and their connections.

These graphs are valuable in numerous fields, from healthcare to education and technology. In medicine, they can be created from clinical notes or medical records. Each node can represent a patient, drug, or medical condition, while edges show the relationships like prescriptions or diagnoses.

Despite the potential benefits, the use of knowledge graphs in clinical analysis is still developing. Constructing them from clinical texts can unlock significant insights into patient care and improve workflows in healthcare settings.

Knowledge Graph Construction

Named Entity Recognition

Named Entity Recognition (NER) is an important task in extracting valuable information from text. It involves locating and classifying entities into predefined categories, such as names of drugs, diseases, or medical procedures. The main classes of entities include:

  • Entity class: Includes names of people, places, or institutions.
  • Time class: Comprises dates and time expressions.
  • Number class: Covers things like currency and percentages.

By recognizing these entities, we can better organize the information found in clinical notes.

Coreference Resolution

Coreference resolution helps identify different expressions that refer to the same entity. For example, recognizing that "the patient" and "he" refer to the same individual in a document. This ensures that the text is less ambiguous and helps in understanding the context more clearly.

Relation Extraction

Relation extraction identifies and classifies the relationships between entities within the text. For instance, it can help determine if a patient is taking a specific drug or if a drug is linked to a particular side effect.

Transformers and BERT

Transformers are advanced neural network architectures that excel in various natural language processing tasks. Specifically, BERT, which stands for Bidirectional Encoder Representations from Transformers, is designed to understand the context of words in text effectively.

BERT is highly efficient and can be fine-tuned for specific tasks, such as extracting relationships in medical records. Its ability to grasp linguistic patterns through training makes it well-suited for processing clinical texts.

BERT Variants

Several specialized versions of BERT exist that cater specifically to the biomedical field. Some of these include:

  • BioBERT: This version is trained on medical datasets and can be fine-tuned for tasks like NER and relation extraction.
  • ClinicalBERT: Focuses on clinical notes and enhances the identification of relationships between medical concepts based on human judgment.
  • Bio-Discharge Summary: This variant is designed to handle clinical notes and can be adjusted for various key medical tasks.

These models leverage the existing knowledge of medical texts to provide better outcomes in understanding and processing clinical information.

Proposed Model

General Architecture

The proposed model combines various phases to create a robust approach for building knowledge graphs from clinical notes. These phases include:

  1. Data Preparation: Preparing the clinical notes for analysis.
  2. Coreference Resolution: Identifying expressions referring to the same entity.
  3. Named Entity Recognition: Extracting relevant entities from the text.
  4. Relation Extraction: Classifying relationships between extracted entities.
  5. Graph Storage: Storing the results in a structured database for analysis.

Each phase works together to transform raw clinical notes into a knowledge graph format. This enables easier access to structured information for analysis and decision-making.

Knowledge Extraction

The NER module is critical as it identifies key information within lengthy clinical notes. Since these notes can exceed the input length that models can handle, we developed a function to split them into manageable parts while preserving context. This ensures that the model can work effectively with the data.

The use of various BERT variants tailored for the medical domain significantly boosts the model's performance, allowing for an accurate extraction of entities and their relationships.

Relation Extraction

The relation extraction task is performed using sequence classification. Here, the model assesses the connections between pairs of entities and determines their relationships. By classifying these relationships, we can enrich the knowledge graph with relevant connections that inform patient care.

Graph Construction

The knowledge graph data model focuses on five key types of entities, such as medications, patients, reasons for taking drugs, side effects, and prescriptions. Each entity has specific attributes that contribute to the overall understanding of patient interactions with medications.

Graph Data Model

The graph is designed to include nodes representing:

  • Medication details: Including dosage, frequency, and form.
  • Patient identifiers: Unique identifiers for patients.
  • Reasons for medication: The justification for prescribing a drug.
  • Adverse effects: Side effects associated with medications.

This structured representation allows for efficient analysis and understanding of the relationships within the medical domain.

Graph Analysis

The ultimate goal of building the knowledge graph is to facilitate insightful analysis. A variety of algorithms can be applied to uncover valuable information from the graph, such as identifying the most frequently prescribed medications or recognizing common side effects.

Results and Evaluation

The dataset for this study consists of clinical notes from a well-known medical care database. By using these records, we can evaluate the effectiveness of our system in extracting relevant information.

Tool Usage

To implement the knowledge graph, we utilize a database management tool called Neo4j. It allows for efficient graph storage and retrieval, enabling quick analysis through a query language designed for graph data.

Performance Insights

The processing of clinical notes into a structured graph is notably quick, taking only a few seconds. This rapid generation ensures that healthcare professionals can get timely insights from the data, making it a practical tool in real-world applications.

Conclusion

In conclusion, the construction of a biomedical knowledge graph from clinical texts offers a significant advancement in managing medical data. This system effectively organizes information, making it accessible for healthcare professionals and contributing to better patient care.

As we continue to enhance our methods, the ultimate aim is to create a search engine that will allow doctors and patients to access the knowledge graph easily. By broadening the application of knowledge graphs in medical and pharmaceutical domains, we can further improve healthcare practices.

The challenges posed by the sheer volume of medical data necessitate innovative solutions. The development of knowledge graphs marks an essential step toward making sense of this data, allowing for better decision-making and more effective treatment strategies.

Original Source

Title: BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis

Abstract: Background : Knowledge is evolving over time, often as a result of new discoveries or changes in the adopted methods of reasoning. Also, new facts or evidence may become available, leading to new understandings of complex phenomena. This is particularly true in the biomedical field, where scientists and physicians are constantly striving to find new methods of diagnosis, treatment and eventually cure. Knowledge Graphs (KGs) offer a real way of organizing and retrieving the massive and growing amount of biomedical knowledge. Objective : We propose an end-to-end approach for knowledge extraction and analysis from biomedical clinical notes using the Bidirectional Encoder Representations from Transformers (BERT) model and Conditional Random Field (CRF) layer. Methods : The approach is based on knowledge graphs, which can effectively process abstract biomedical concepts such as relationships and interactions between medical entities. Besides offering an intuitive way to visualize these concepts, KGs can solve more complex knowledge retrieval problems by simplifying them into simpler representations or by transforming the problems into representations from different perspectives. We created a biomedical Knowledge Graph using using Natural Language Processing models for named entity recognition and relation extraction. The generated biomedical knowledge graphs (KGs) are then used for question answering. Results : The proposed framework can successfully extract relevant structured information with high accuracy (90.7% for Named-entity recognition (NER), 88% for relation extraction (RE)), according to experimental findings based on real-world 505 patient biomedical unstructured clinical notes. Conclusions : In this paper, we propose a novel end-to-end system for the construction of a biomedical knowledge graph from clinical textual using a variation of BERT models.

Authors: Ayoub Harnoune, Maryem Rhanoui, Mounia Mikram, Siham Yousfi, Zineb Elkaimbillah, Bouchra El Asri

Last Update: 2023-04-21 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2304.10996

Source PDF: https://arxiv.org/pdf/2304.10996

Licence: https://creativecommons.org/licenses/by-nc-sa/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles