Simple Science

Cutting edge science explained simply

# Health Sciences# Health Informatics

New Insights on Sleep and Alzheimer's Disease

Research highlights the link between sleep and Alzheimer's using NLP algorithms.

― 7 min read


Sleep's Role inSleep's Role inAlzheimer's ResearchAlzheimer's disease.NLP tools reveal sleep's impact on
Table of Contents

Alzheimer's Disease (AD) is a common form of dementia that affects millions of people in the United States. Currently, about 5.7 million Americans have AD, and this number is expected to rise to 13.8 million by the year 2050 as the population ages. In 2015, more than 110,000 deaths were linked to AD, making it a leading cause of death, especially among older adults. Unlike other causes of death like stroke, the death rate from AD has increased significantly.

Postponing the onset of dementia even by a year could reduce the number of people affected by AD and lower care costs. Therefore, early intervention to reduce the risk of AD is essential for better public health.

Social and Behavioral Factors Affecting Health

Social and behavioral factors, known as social determinants of health (SDOH), play a key role in the risk of developing AD. These factors can be changed and offer an opportunity to lower the risk of the disease. One important factor is sleep. Research shows that sleep is critical for maintaining brain health as people age.

However, the relationship between sleep and AD is complicated. Some studies find that sleep problems, such as difficulty falling asleep, excessive daytime sleepiness, or poor sleep quality, may increase the risk of developing cognitive issues and could be early signs of future AD. On the other hand, some studies find no link between sleep issues and cognitive decline. Furthermore, as people with AD may face sleep problems due to the disease itself, the relationship between sleep and cognitive health is not straightforward.

Despite growing interest in studying how sleep impacts AD, more long-term studies involving large groups are needed to clarify this relationship. One challenge in conducting this research is that traditional methods for gathering sleep and AD data can be slow and inefficient.

The Role of Electronic Health Records

Healthcare organizations collect a vast amount of electronic health records (EHRs), which offer a chance to analyze large patient groups and understand trends. EHRs have been used in AD research to assess care use, identify health issues, and explore health disparities. However, sleep information often remains underused in AD research.

A major issue with using EHRs for sleep research is that most sleep information is hidden within doctor notes. To tackle this, researchers have turned to Natural Language Processing (NLP), a technology that helps process and understand written language. While NLP has been used in many health studies, there have been no algorithms specifically designed to gather sleep information from the notes of AD patients.

Developing Algorithms to Extract Sleep Information

In response to this gap, researchers created different types of NLP algorithms to make it easier to extract sleep-related information from Clinical Notes of AD patients. These included rule-based algorithms, machine learning methods, and new language models to identify issues like snoring, daytime sleepiness, and sleep duration.

The research team trained and tested these algorithms using clinical notes collected from a healthcare provider. The rule-based NLP algorithm performed the best at identifying sleep concepts from the notes.

Gathering and Preparing Data

To start, the team defined a group of patients diagnosed with AD. They collected clinical notes from these patients over a five-year period. After gathering the data, they cleaned it to ensure accuracy, removing duplicates and organizing the information.

Next, the researchers needed to find sleep-related information within the notes. They performed a keyword search to identify documents containing sleep discussions and selected a portion of these for further analysis.

Creating a Gold Standard Dataset

To ensure the information gathered was accurate, a small sample of clinical notes was reviewed manually to create a "gold standard" dataset. Health informatics students reviewed the notes to identify mentions of various sleep-related issues, such as snoring and sleep problems. The researchers refined the annotation process until they reached a satisfactory level of agreement among the annotators.

Building NLP Algorithms

The researchers created a rule-based NLP algorithm named nlp4sleep to extract sleep information from the clinical notes. They used established medical terminology to identify keywords related to sleep issues. By analyzing the data, they developed specific rules that allowed the algorithm to accurately pinpoint sleep-related concepts.

In addition, they trained machine learning models to classify the sleep concepts. Different types of models were tested, including Decision Trees, Logistic Regression, K-Nearest Neighbors, and Support Vector Machines. While these models showed varying degrees of success, they generally struggled with false positives, meaning they sometimes misidentified unrelated text as related to sleep.

Language Models and Enhancements

To improve extraction methods, the researchers also explored more advanced language models. They used a model known as LLAMA2, which integrated reasoning processes to better understand and classify the sleep concepts present in the notes.

This model was trained on a set of examples to help it accurately find sleep-related information within the clinical narratives. The LLAMA2 model, particularly when fine-tuned, showed promising results in identifying sleep issues and provided a good balance of sensitivity (correctly identifying sleep issues) and specificity (correctly disregarding irrelevant information).

Evaluating Algorithm Performance

The researchers tested the performance of their algorithms by measuring their ability to identify sleep concepts accurately. The rule-based NLP algorithm consistently outperformed the other models, achieving high sensitivity and specificity scores.

While machine learning models did provide some valuable insights, they were more prone to making mistakes. This variability highlighted the challenges of using machine learning in clinical applications where accuracy is crucial.

Analyzing Errors

The research team also conducted a thorough analysis of the errors made by the rule-based NLP algorithm. They found that some mistakes stemmed from misunderstandings in the clinical text or from difficulties in correctly identifying negations, such as when a note mentioned a patient not experiencing sleep problems.

Complexity in how clinical notes are written, including overlapping concepts, made extracting accurate information more challenging.

Importance of Accurate Documentation

The study revealed that sleep-related information is not well-documented in clinical notes. Many patients had little to no information recorded about their sleep issues, which can complicate understanding the broader landscape of sleep and AD.

This under-documentation raises questions about the reliability of using EHRs for research and whether existing records can effectively support studies aimed at understanding the relationship between sleep and AD.

Challenges and Future Directions

There are several ongoing challenges in this area of research. The initial criteria used to select patients and gather data may not be ideal. The study's annotated dataset was also relatively small, which affects the generalizability of the findings.

Moving forward, researchers plan to investigate more advanced methods to retrieve relevant information about sleep from clinical notes, focusing on making the data collection process more effective.

This effort could significantly contribute to understanding the crucial connection between sleep and AD. Since sleep is a modifiable lifestyle factor, further research could lead to better interventions that address sleep disturbances in those with AD.

By building accurate and efficient tools for extracting sleep-related information from EHRs, researchers can advance their understanding of how sleep affects cognitive health, ultimately benefiting those affected by AD and similar conditions.

Conclusion

In summary, this study highlights the potential of NLP to extract meaningful sleep information from clinical notes related to Alzheimer's Disease. The rule-based NLP algorithm has been shown to be effective in identifying sleep concepts, outperforming other approaches. As researchers continue to refine these tools, they can better understand how sleep impacts cognitive health and develop interventions to help those with Alzheimer's Disease.

Original Source

Title: Extraction of Sleep Information from Clinical Notes of Patients with Alzheimer's Disease Using Natural Language Processing

Abstract: ObjectiveAlzheimers Disease (AD) is the most common form of dementia in the United States. Sleep is one of the lifestyle-related factors that has been shown critical for optimal cognitive function in old age. However, there is a lack of research studying the association between sleep and AD incidence. A major bottleneck for conducting such research is that the traditional way to acquire sleep information is time-consuming, inefficient, non-scalable, and limited to patients subjective experience. Materials and MethodsA gold standard dataset is created from manual annotation of 570 randomly sampled clinical note documents from the adSLEEP, a corpus of 192,000 de-identified clinical notes of 7,266 AD patients retrieved from the University of Pittsburgh Medical Center (UPMC). We developed a rule-based Natural Language Processing (NLP) algorithm, machine learning models, and Large Language Model(LLM)-based NLP algorithms to automate the extraction of sleep-related concepts, including snoring, napping, sleep problem, bad sleep quality, daytime sleepiness, night wakings, and sleep duration, from the gold standard dataset ResultsRule-based NLP algorithm achieved the best performance of F1 across all sleep-related concepts. In terms of Positive Predictive Value (PPV), rule-based NLP algorithm achieved 1.00 for daytime sleepiness and sleep duration, machine learning models: 0.95 and for napping, 0.86 for bad sleep quality and 0.90 for snoring; and LLAMA2 with finetuning achieved PPV of 0.93 for Night Wakings, 0.89 for sleep problem, and 1.00 for sleep duration. DiscussionAlthough sleep information is infrequently documented in the clinical notes, the proposed rule-based NLP algorithm and LLM-based NLP algorithms still achieved promising results. In comparison, the machine learning-based approaches didnt achieve good results, which is due to the small size of sleep information in the training data. ConclusionThe results show that the rule-based NLP algorithm consistently achieved the best performance for all sleep concepts. This study focused on the clinical notes of patients with AD, but could be extended to general sleep information extraction for other diseases.

Authors: Yanshan Wang, S. Sivarajkumar, T. Y. C. Tam, H. Ahamed Mohammad, S. Viggiano, D. Oniani, S. Visweswaran

Last Update: 2024-03-15 00:00:00

Language: English

Source URL: https://www.medrxiv.org/content/10.1101/2022.03.29.22273078

Source PDF: https://www.medrxiv.org/content/10.1101/2022.03.29.22273078.full.pdf

Licence: https://creativecommons.org/licenses/by-nc/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to medrxiv for use of its open access interoperability.

More from authors

Similar Articles