Simple Science

Cutting edge science explained simply

# Health Sciences# Epidemiology

Machine Learning Techniques in COVID-19 Diagnosis

Analyzing patient symptoms through machine learning to improve COVID-19 diagnoses.

― 6 min read


AI in COVID-19 DiagnosisAI in COVID-19 DiagnosisCOVID-19 symptom analysis.Using machine learning to enhance
Table of Contents

COVID-19, also known as Coronavirus Disease 2019, is caused by a virus named SARS-CoV-2. The first cases were reported in Wuhan, China, at the end of 2019. In March 2020, the World Health Organization (WHO) recognized it as a global pandemic. This virus spreads easily between people and can change, leading to new variants. These changes and the way the virus spreads have led to repeated waves of infections worldwide.

To reduce the number of people getting sick, Testing became critical, especially for those showing Symptoms or having contact with infected individuals. When testing was not available, doctors relied on patients describing their symptoms to understand their condition. Analyzing these signs and symptoms is essential for making quick and accurate Diagnoses, which can lead to better treatment.

Machine Learning in Healthcare

Machine Learning (ML) is a part of Artificial Intelligence (AI) that has been used in healthcare since the mid-20th century. Its role has grown significantly due to the digital collection of patient data. ML is recognized for its potential to bring new ideas to healthcare by analyzing large amounts of diverse data in much less time than people can.

However, some ML systems are complex and difficult for healthcare professionals to interpret. This complexity can make it challenging for them to trust the results produced by these systems.

Goals of the Study

The main goal of this study was to examine ML methods and models for diagnosing COVID-19 based only on patient symptoms. It aims to assist healthcare providers during outbreaks. The specific objectives are:

  1. Evaluate how effective ML techniques are in diagnosing COVID-19, especially during different waves of contagion.
  2. Improve the clarity of COVID-19 diagnoses to aid healthcare providers in deciding the best treatment.
  3. Identify the most common symptoms during various waves of infections.
  4. Analyze how symptoms vary across different waves of COVID-19.

Related Studies

Previous studies have used various methods to build tools for diagnosing COVID-19. For example, one study in Jordan created a diagnostic tool using ML techniques, achieving over 90% accuracy. Another study in England analyzed data from over a million participants to assess infection rates. By focusing on symptoms and test results, these studies found that certain symptoms were more reliable indicators of COVID-19.

While many studies used ML to help with COVID-19 diagnosis, this research differentiates itself by focusing on how symptoms change over time and how different tests affect results. The study also emphasizes balanced data, ensuring that results are not biased toward one side.

Methodology

The research involved several steps to analyze the data effectively. The first step was to collect data on patients’ symptoms and test results from health systems or online forms. Next, a data preprocessing phase began, where the collected information was organized, and missing data or outliers were addressed. The data was then grouped based on different waves of infection.

Once the data was prepared, various ML algorithms were tested to see how well they could predict COVID-19 based on the identified symptoms. Five algorithms were selected for this purpose: Random Forest, Multi-Layer Perceptron, XGBoost, Logistic Regression, and a method called Shapley Additive Explanation for understanding model results.

Results Overview

The study focused on a health facility in Rio de Janeiro, where data from patients diagnosed with COVID-19 was analyzed. The research aimed to assess how various symptoms reported by patients changed over different waves of the pandemic. The symptoms were grouped and analyzed to create a clearer picture of what signs might indicate an infection.

After detailed analysis, different symptoms emerged as influential during each wave. For instance, symptoms like fever and cough were consistently important, but other symptoms like nasal congestion became more relevant in later waves.

Data Analysis and Findings

The study divided data into various groups based on different waves and types of tests. This allowed for a better understanding of how symptoms presented themselves during each wave and how effective different testing methods were.

During the analysis, it was discovered that the Random Forest algorithm outperformed the others in many scenarios. The performance was measured using various metrics, including accuracy, sensitivity, and specificity, which show how well the models identified positive and negative cases of COVID-19.

Overall, the results indicated that certain symptoms were strong predictors of COVID-19 infection. Symptoms such as fever, cough, and myalgia were significant across multiple waves, while others varied in importance.

Impact of Testing Methods

The study compared different testing methods, including RT-PCR, RT-antibody, and RT-antigen tests. RT-PCR tests generally provided the most reliable results for diagnosing active COVID-19 infections. In contrast, RT-antibody tests often led to less reliable findings, especially early in the infection.

It was noted that tests performed within a specific time frame from the onset of symptoms yielded better results. For instance, when tests were done within 3 to 7 days after symptoms began, the accuracy improved significantly, showing the value of timely testing in managing the disease.

Symptoms Variation Across Waves

An important part of the analysis involved understanding how symptoms changed over the course of the pandemic. In different waves, the significance of certain symptoms fluctuated. For example, during the first wave, symptoms like anosmia (loss of smell) were critical indicators, but as the pandemic continued, symptoms like nasal congestion and sore throat became more prominent.

The findings highlighted that models trained on data from one wave did not perform well when used with data from another wave. This suggests that public health strategies may need to adapt as the virus evolves and the symptoms associated with infection change.

Explanation of Results

To help healthcare providers interpret the results of the ML models, an explainable method was used. This approach helped clarify which symptoms influenced the diagnosis predictions the most. By understanding these relationships, healthcare professionals can make more informed decisions about treatment options.

For example, using visualization techniques, the study illustrated which symptoms had the most significant effect on diagnosing COVID-19 in each wave. It also indicated that some symptoms could signify positive or negative results, guiding healthcare providers in their assessments.

Conclusions and Future Directions

In conclusion, this study shows that ML techniques can effectively analyze symptoms to assist in COVID-19 diagnosis. The findings emphasize the need for timely and accurate testing and the importance of understanding symptom variation over time.

However, the models also revealed limitations, particularly when trying to apply findings from one wave to another. It’s clear that as the virus changes, so do the signs and symptoms associated with it, indicating a need for ongoing research and adaptation in diagnostic approaches.

For future studies, researchers aim to incorporate new data related to virus variants and vaccination status. This could provide deeper insights into how different signs and symptoms correlate with specific variants. There are also plans to apply the same methodologies to other diseases, potentially expanding their impact on public health.

Original Source

Title: Analysis of signs and symptoms of SARS-CoV-2 virus infection considering different waves using Machine Learning

Abstract: In March 2020, the World Health Organization declared a world pandemic of COVID-19, which can manifest in humans as a consequence of virus infection of SARS-CoV-2. On this context, this work uses Data Mining and Machine Learning techniques for the infection diagnosis. A methodology was created to facilitate this task and can be applied in any outbreak or pandemic wave. Besides generating diagnosis models based only on signals and symptoms, the method can evaluate if there are differences in signals and symptoms between waves (or outbreaks) through explainable techniques of the machine learning models. Another aspect is identifying possible quality differences between exams, for example, Rapid Test (RT) and Reverse Transcription-Polymerase Chain Reaction (RT-PCR). The case study in this work is based on data from patients who sought care at Piquet Carneiro Polyclinic of the State University of Rio de Janeiro. In this work, the results obtained with the tests were used to diagnose symptomatic infection of the SARS-CoV-2 virus, based on related signals and symptoms, and the date of the initial of these signals and symptoms. Using the Random Forrest model, it was possible to achieve the result of up to 76% sensitivity, 86% specificity, and 79% accuracy in the results of tests in one contagion wave of the SARS-CoV-2 virus. Moreover, differences were found in signals and symptoms between contagion waves, in addition to the observation that exams RT-PCR and RT Antigen tests are more reliable than RT antibody test.

Authors: Felipe Cassemiro Ulrichsen, A. da Costa Sena, K. Figueiredo, L. C. Porto

Last Update: 2024-02-13 00:00:00

Language: English

Source URL: https://www.medrxiv.org/content/10.1101/2024.02.12.24302722

Source PDF: https://www.medrxiv.org/content/10.1101/2024.02.12.24302722.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to medrxiv for use of its open access interoperability.

Similar Articles