Simple Science

Cutting edge science explained simply

# Health Sciences # Rehabilitation Medicine and Physical Therapy

Predicting Stroke-Associated Pneumonia Risk

New models help identify pneumonia risk in stroke patients.

Ting Wang, C. Li, J. Yuan, L. Yuan, M. You

― 5 min read


Stroke Pneumonia Risk Stroke Pneumonia Risk Models stroke patients. New models predict pneumonia risk for
Table of Contents

Stroke is a serious health issue that can greatly affect a person's quality of life and survival. In 2019, about 6.55 million people died from stroke worldwide, making it one of the leading causes of death. After experiencing a stroke, many patients face various challenges, with one common complication being stroke-associated pneumonia (SAP). Studies show that the occurrence of SAP can range from 7% to 38% of stroke patients. SAP can lead to longer hospital stays, higher medical costs, and a greater risk of death. Currently, doctors mainly treat SAP with antibiotics, but these may not effectively lower the chances of developing SAP. Therefore, it's essential for healthcare providers to quickly identify patients at high risk for SAP and take steps to prevent it, which can improve patient outcomes.

The Importance of Risk Prediction Models

Creating models that predict the risk of SAP can help doctors identify high-risk patients early, allowing for timely interventions to reduce the chance of developing SAP. Researchers have created different prediction models in recent years, like scoring systems that help healthcare professionals assess a patient's risk for SAP. However, even reliable models can lose their effectiveness over time due to changes in risk factors, treatment methods, or other factors. This means that these models need to be updated regularly. Additionally, few studies have used interpretable machine learning techniques to create SAP prediction models. This study combines new and known predictors using machine learning and a method called SHAP to explain the predictions better.

Study Design and Participants

This study examined stroke patients from a specific hospital over nearly a year. To be included in the study, participants had to be at least 18 years old, diagnosed with stroke, and not have needed a breathing machine within a week of the stroke. Patients were excluded if they were discharged or passed away within 24 hours of admission, had a lung infection before the stroke, chose to stop treatment, or if their data was mostly missing. This research followed ethical guidelines and got approval from the hospital's ethics committee.

Identifying Predictive Factors

Researchers identified 27 factors that could help predict the risk of SAP. These factors included general demographic information, such as age and gender, as well as medical details like the patients' daily living abilities, type and location of stroke, presence of swallowing difficulties, and other health conditions like high blood pressure and diabetes. Other factors included patients' personal histories, treatments received, and various laboratory test results.

What is SAP?

SAP is defined as pneumonia that occurs within seven days for stroke patients who did not need a breathing machine. To diagnose SAP, specific guidelines are followed, ensuring that the diagnosis aligns with established medical standards.

Sample Size Calculation

To determine how many patients needed to be included in the study, researchers used a method that takes multiple factors into account to ensure an accurate prediction model. Based on existing data, they estimated that between 701 and 1272 patients should be included to create a reliable model.

Data Collection and Preparation

Researchers collected data by reviewing electronic medical records, including admission records and lab results. They ensured that the data collection process was unbiased by keeping the outcome information separate from the predictive factors. To handle missing data, they used a method that preserves the data's accuracy and integrity. After sorting the data, they split it into two parts: one for building the prediction model and one for testing how well the model worked.

Building and Evaluating the Model

The study focused on using various statistical methods to build a model that predicts the risk of SAP. Researchers used a technique called Lasso regression to narrow down the predictive factors to six: nasogastric tube therapy, age, daily living activities, and several lab results. They tested multiple machine learning methods, including decision trees, logistic regression, and others. The best-performing method, called XGBoost, demonstrated strong predictive capability, allowing for an effective assessment of the model's performance.

Understanding the Model’s Predictions

The SHAP method helps explain how each predictor contributes to the model's outcome. It provides insights into the importance of each variable. For instance, lower daily living activity scores indicated a higher risk for SAP. This may be because limited self-care ability can lead to longer bed rest and a greater chance of infections.

Similarly, using a nasogastric tube for feeding was identified as a risk factor. This tube can lead to complications that increase the likelihood of pneumonia. Older patients also showed a higher risk, likely due to a natural decline in immune function with age. The study found that high levels of certain lab results, particularly sensitive C-reactive protein and low levels of hemoglobin, were associated with a higher risk for SAP.

Machine Learning Advantages

Machine learning techniques, like the ones used in this study, have distinct advantages over traditional methods. They can efficiently handle large amounts of data and uncover complex relationships that simpler models might miss. The XGBoost method stood out in this study for its accuracy and ability to provide interpretable results, making it a valuable tool in predicting risks for patients.

Limitations of the Study

Despite the promising results, the study had some limitations. First, it was conducted at a single center, which may restrict the findings' applicability to other settings. The study also relied on existing medical records, potentially leading to incomplete data. Furthermore, external validation in diverse populations has not yet been performed, meaning the model's generalizability needs more examination. Future efforts should focus on improving the model by incorporating more factors and testing other advanced methods.

Conclusion

The models developed through this research show strong potential in predicting the risk of pneumonia for stroke patients. The XGBoost model performed particularly well and provides practical insights that can assist healthcare providers in making informed decisions. The use of the SHAP method offers a clearer understanding of the factors that influence the risk of developing pneumonia, ultimately aiding in patient care and improving outcomes.

Original Source

Title: Prediction of stroke-associated pneumonia risk in stroke patients based on interpretable machine learning

Abstract: BackgroundStroke-associated pneumonia (SAP) is a frequent complication of stroke, characterized by its high incidence rate, and it can have a severe impact on the prognosis of patients. The limitations of current clinical treatment measures underscore the critical need to identify high-risk factors promptly to decrease the incidence of SAP. ObjectiveTo analyze the risk factors of SAP in stroke patients, construct a predictive model of SAP based on the SHAP interpretable machine learning method, and explain the important variables. MethodsA total of 763 stroke patients admitted to the Second Affiliated Hospital of Anhui University of Traditional Chinese Medicine from July 1, 2023, to May 31, 2024, were selected and randomly divided into the model training set (n=457) and model validation set (n=306) according to the ratio of 6:4. Firstly, the included data were sorted out, and then Lasso regression was used to screen the included characteristic variables. Based on the tidymodels framework, Using decision tree (DT), logistic regression, extreme gradient boosting (XGBoost), support vector machine (SVM), The classification model was constructed by five machine learning methods, including SVM and LightGBM. The grid search and 5-fold cross validation were used to optimize the hyperparameter optimization strategy and the performance index of the model. The predictive performance of the model was evaluated by the area under the receiver operating curve (AUC), calibration curve, and decision curve analysis (DCA), and we used Shapley additive explanation (SHAP) to account for the model predictions and provide interpretable insights. ResultsThe incidence of SAP in this study was 31.72% (242/763). Six variables were selected by Lasso regression, including nasogastric tube use, age, ADL score, Alb, Hs-CRP, and Hb. The model with the best performance in the validation set was the XGBoost model, with an AUC of 0.926, an accuracy of 0.914, and an F1 score of 0.889. Its calibration curve and DCA showed good performance. SHAP algorithm showed that ADL score ranked first in importance. ConclusionThe model constructed using XGBoost has good prediction performance and clinical applicability, which is expected to support clinical decision-making and improve the prognosis of patients.

Authors: Ting Wang, C. Li, J. Yuan, L. Yuan, M. You

Last Update: 2024-10-29 00:00:00

Language: English

Source URL: https://www.medrxiv.org/content/10.1101/2024.10.27.24316222

Source PDF: https://www.medrxiv.org/content/10.1101/2024.10.27.24316222.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to medrxiv for use of its open access interoperability.

More from authors

Similar Articles