Using Technology to Improve Death Data Collection

This study investigates new ways to gather mortality information using online sources.

Table of Contents

The Importance of Accurate Death Data
Other Sources of Death Data
Social Media: A New Hope?
Our Study: Extracting Death Information Using Technology
Collecting the Data
Preparing to Train Our Tools
Building the Technology
Evaluating Our Results
Findings: What Did We Learn?
Addressing the Challenges
Implications and Future Directions
Conclusion
Original Source

Mortality, or death, is an important aspect of Healthcare research. Researchers study various reasons why people die and how often it happens to understand health trends and improve patient care. One of the most common ways to look at this is through all-cause mortality, which means considering all reasons for death. Knowing when and why someone dies is crucial for many types of health research, including clinical trials and safety monitoring of medical products.

The Importance of Accurate Death Data

Accurate details about deaths, like the time and cause, are essential for effective research. If researchers fail to capture this information, it can lead to big mistakes, like underestimating how many people die due to certain medical products. This can have serious consequences for public health.

Researchers have found that poor access to date of death and cause of death information is a major barrier to conducting thorough studies. For example, the US FDA has a system called the Sentinel Active Risk Identification and Analysis (ARIA) system to address regulatory questions. This system relies on accurate death data, and any gaps can lead to incomplete studies.

The best source of death data in the United States comes from vital statistics collected from death certificates filled out by coroners, medical examiners, or doctors. Once this information is compiled at the state level, it is sent to the Centers for Disease Control and Prevention (CDC) for further coding and analysis. However, there’s a catch: it takes a long time-usually about nine months-before this information is released to the public.

Other Sources of Death Data

While death certificates are the “gold standard” for mortality data, there are other sources like claims databases and medical records. But there is a downside to these sources too. Claims databases might miss information about uninsured people, while medical records can vary widely between healthcare providers, making it hard to combine the data for analysis.

When it comes to claims databases, details about deaths are often incomplete or not recorded at all. Similarly, when we look at electronic health records, they often lack comprehensive death data, especially if patients weren’t under the care of that specific health system when they died. This missing information creates significant challenges for researchers who want to use these databases for studies on public health and care quality.

Social Media: A New Hope?

In recent years, social media has emerged as a potential new source of death-related information. People share news about deaths on platforms like Twitter, GoFundMe, and various memorial websites. While this may seem like a chicken-and-egg problem, researchers are starting to tap into these online platforms to gather information that could be useful for healthcare research.

There’s growing interest in using social media for public health. User posts have been used to track illnesses, measure risky behaviors, find disease hotspots, and analyze medication usage. However, a big challenge remains: how to effectively extract date and cause of death information from all that noise. Although social media may offer quicker and broader coverage of mortality information, turning it into usable data comes with its own set of hurdles.

Our Study: Extracting Death Information Using Technology

In this study, we aimed to develop some cool tools to help snag both the fact of a death and the cause of death from public records online. These tools would allow us to see whether social media and obituary data contain enough useful information to improve understanding of mortality trends. By combining this information with traditional sources, we hoped to enhance the quality of the data used in healthcare research.

We relied on a technology called Natural Language Processing (NLP) to sift through all the information on social media and other online platforms. NLP allows computers to understand and interpret human language, making it easier to extract relevant data.

Collecting the Data

We collected information from various online sources, including Twitter, GoFundMe, and multiple obituary websites from 2015 to 2022. We looked for posts containing keywords like “death” and “deceased.” To put it simply, we went hunting for anything that could help us gather details about mortality.

To gather data from Twitter, we used around 50 keywords, which led us to about 40 million tweets! Then, we applied the same strategy to GoFundMe and memorial websites, but with slightly different methods. For the obituary sources, we collected structured details like names, dates of birth, and dates of death.

Once we had all this information, we used NLP techniques to fill in any gaps or correct any mistakes. The idea was to maximize the data we could extract and build a comprehensive dataset on mortality.

Preparing to Train Our Tools

To train our NLP tools, we needed a gold-standard reference dataset. To achieve this, we created an annotated set of records using names, dates, and causes of death. We instructed the annotators on how to classify the data accurately. They categorized names, dates, and causes, ensuring that every detail was accounted for.

A total of 4,200 records were sampled for training, testing, and validation, with each record being scrutinized to ensure high quality. We even calculated agreement rates between the annotators to ensure everyone was on the same page.

Building the Technology

We worked on two NLP tools in parallel to extract the necessary information from the online sources. We utilized deep learning methods, which are like super-smart algorithms, to handle the complex task of identifying names, dates, and causes of death.

For identifying causes of death, we relied on a technique called few-shot learning, which is a way to train models using only a small number of examples. This technique used a large language model that can understand context and produce accurate results.

Evaluating Our Results

After we developed our tools, we tested them on a new set of data to evaluate their performance. We wanted to see how well our NLP models could identify the causes of death compared to human annotators. We had trained nurses who reviewed the results to ensure everything met the established guidelines.

Our evaluations involved comparing the results from our models and the human annotators. This allowed us to measure how accurately both were able to identify the primary cause of death and all other relevant information.

Findings: What Did We Learn?

After crunching the data, we found that our NLP tools performed quite well! One model called RoBERTa stood out, achieving a high score for accuracy in extracting names, dates, and causes of death.

Interestingly, when we compared causes of death from our models with those identified by human annotators, we found that our automated system performed admirably. In some cases, the model was even better at identifying the primary cause of death than humans!

However, we did notice that our model did struggle a bit with identifying additional causes of death, particularly in sources that listed multiple conditions.

Addressing the Challenges

As great as our results were, we did encounter a few hurdles along the way. One of the biggest challenges was that social media data doesn’t always represent the entire population fairly. Some segments of society might not be as active online, which could lead to gaps in the data.

Additionally, while our systems gathered a lot of accurate information, some of the details could be unclear. Extracting the cause of death from text can be tricky, especially when there are multiple conditions mentioned. While our methods performed well, there’s still room for improvement.

Implications and Future Directions

Automated extraction of mortality information from online sources holds great potential for healthcare research. Traditional mortality databases often suffer from delays in reporting and can be incomplete. By utilizing social media and online obituary data, we could quickly gather critical details about deaths that could contribute to better healthcare research.

Moreover, if we can validate these online sources, they could provide timely insights into emerging trends in mortality. This would be particularly valuable in tracking health crises, like pandemics or environmental disasters.

To effectively incorporate this online data into existing public health systems, collaboration between researchers and public health agencies will be essential. Developing protocols for integrating this data into surveillance systems could enhance public health responses.

Conclusion

In conclusion, our study demonstrated the promising potential of using advanced NLP techniques to extract critical mortality information from various online sources. By tapping into social media and obituaries, we can fill gaps in traditional mortality databases and provide more timely and comprehensive insights into health trends.

However, as we move forward, it’s important to acknowledge the limitations of online data and the need for further refinement of our tools. By continuing to improve our methods and validating our findings, we can ensure that this new approach to mortality data in healthcare research becomes even more valuable.

So, while we ponder the mysteries of life and death, we might just find that our laptops can help shed some light on the subject!

Using Technology to Improve Death Data Collection

The Importance of Accurate Death Data

Other Sources of Death Data

Social Media: A New Hope?

Our Study: Extracting Death Information Using Technology

Collecting the Data

Preparing to Train Our Tools

Building the Technology

Evaluating Our Results

Findings: What Did We Learn?

Addressing the Challenges

Implications and Future Directions

Conclusion

Referenced Topics

Similar Articles

Using Technology to Improve Death Data Collection

#The Importance of Accurate Death Data

#Other Sources of Death Data

#Social Media: A New Hope?

#Our Study: Extracting Death Information Using Technology

#Collecting the Data

#Preparing to Train Our Tools

#Building the Technology

#Evaluating Our Results

#Findings: What Did We Learn?

#Addressing the Challenges

#Implications and Future Directions

#Conclusion

Referenced Topics

Similar Articles

The Importance of Accurate Death Data

Other Sources of Death Data

Social Media: A New Hope?

Our Study: Extracting Death Information Using Technology

Collecting the Data

Preparing to Train Our Tools

Building the Technology

Evaluating Our Results

Findings: What Did We Learn?

Addressing the Challenges

Implications and Future Directions

Conclusion