Addressing Bias in Brain Age Prediction Models
Analysis shows demographic disparities in brain age prediction accuracy.
― 6 min read
Table of Contents
Predicting Brain Age using MRI Scans is becoming a common way to help identify various brain diseases. However, the data used to train these prediction models often lack diversity in terms of race and sex. This could lead to biased outcomes. This article looks into how different demographic groups perform with one specific model and what the features of that model can reveal. Our goal is to highlight the need for a fair analysis when it comes to demographic differences in brain age prediction models.
Background
With the growing population and longer life spans, age-related brain diseases such as dementia are increasingly prevalent. Therefore, connecting brain aging to these diseases is crucial for better diagnosis and treatment. Brain age prediction could help identify how a person's brain health compares to typical standards. Different studies have suggested using predicted brain age as a sign of various brain conditions, including epilepsy and other clinical risk factors. Most research uses structural MRI scans, which are common in hospitals and provide high-quality images of the brain.
Despite the advantages of using established datasets like the UK Biobank and Cam-CAN, there's a noticeable lack of racial and ethnic diversity. Many of these studies are primarily based on White individuals, which could mean that the prediction models work less effectively for other groups. Here, we focus on a model known as ResNet-34, examining its effectiveness across different demographic groups.
Materials and Methods
For our study, we trained the brain age prediction model using MRI scans from healthy volunteers from specific datasets. We then tested it on a much larger dataset to include information on race and sex. We used different statistical tests to evaluate Performance and considered Demographics when analyzing results.
We trained the model on T1-weighted MRI images and processed the scans using specific techniques to improve quality. To check for biases in model performance, we split the groups by race and sex, leading to six different demographic subgroups.
Performance Analysis
We first looked at prediction error, which measures how far off the model's age predictions are from actual ages. To ensure we had enough participants in each subgroup, we combined some racial categories and excluded those labeled as "Other." This helped create a clearer comparison between the groups.
Next, we used statistical methods to compare the performance across these six subgroups. Because not all data met the usual assumptions for some tests, we switched to a non-parametric test called the Kruskal-Wallis test. This gave us a better understanding of how different subgroups compared.
Results
Our analysis showed noticeable differences in how accurately the model predicted brain age for different racial and sex groups. Specifically, Black individuals fared worse compared to both White and Asian individuals, while male subjects also performed differently compared to females. This suggests clear disparities in how well the model works for various demographics.
When looking at the model's features, we also found some differences that linked back to demographic factors. The results indicated that the information the model was using could sometimes highlight racial or biological sex differences, raising concerns about fairness and accuracy.
Age Distribution
The age distribution across subgroups revealed patterns indicating that younger males were more common in the White group, while older females were represented more in the same group. In contrast, younger males were less commonly found in the Black group. This inequality in age representation could impact the model's predictive capabilities.
Absolute Performance Assessment
We conducted tests to ensure our data followed expected distributions. Both normal distribution and variance testing showed that our groups differed significantly. This meant we could not use standard models of analysis and had to rely on the Kruskal-Wallis test for a more robust comparison.
Our findings from this test highlighted significant differences in mean prediction errors for the different racial and biological sex groups. The model performed best for White females and worst for Black males. These results underscore the importance of considering how data imbalances can affect model performance.
Feature Assessment
Beyond performance, we examined the features generated by the model to see if there was Bias present. Using a method called Principal Component Analysis (PCA), we could visualize how features varied by age, race, and sex. Certain features stood out in showing clear differences between groups, suggesting that while the model aims to predict age, it might inadvertently reflect biases in the underlying data.
Discussion
This study highlights the need for careful consideration of demographic factors in brain age prediction models. Significant performance differences were observed, particularly affecting Black and male subjects. Given that these groups were underrepresented in the training data, it's unsurprising that the model struggles to predict their brain age accurately.
These findings raise important questions about the implications of using such models in clinical settings. For example, if these biases continue, it may lead to unequal healthcare outcomes for certain groups. Further research is necessary, not just with this model but also across other models, to see if similar biases exist.
The research shows that even a slight shift in demographic representation can yield notable variations in performance. The average discrepancies in brain age predictions can have real-world implications, especially if used as indicators of medical risks.
Limitations
While our study sheds light on these biases, there are several limitations to consider. The age range of subjects in the UK Biobank was more limited compared to the training data, which could skew the results further. Additionally, focusing on only one model type restricts our understanding of the broader implications of these biases.
Looking ahead, it would be valuable to replicate this analysis with other popular models or feature types. Different machine learning approaches could provide a more comprehensive view of how biases affect brain age prediction and ultimately patient care.
Conclusion
This research emphasizes the importance of fairness in brain age prediction models. By identifying and addressing potential biases, we can work toward improving the reliability of these models across all racial and biological sex groups. As brain age prediction tools become more integrated into clinical practice, ensuring their reliability for all patients is essential. Ongoing efforts are needed to assess biases in these models and to create algorithms that provide accurate and equitable results for everyone.
Title: Analysing race and sex bias in brain age prediction
Abstract: Brain age prediction from MRI has become a popular imaging biomarker associated with a wide range of neuropathologies. The datasets used for training, however, are often skewed and imbalanced regarding demographics, potentially making brain age prediction models susceptible to bias. We analyse the commonly used ResNet-34 model by conducting a comprehensive subgroup performance analysis and feature inspection. The model is trained on 1,215 T1-weighted MRI scans from Cam-CAN and IXI, and tested on UK Biobank (n=42,786), split into six racial and biological sex subgroups. With the objective of comparing the performance between subgroups, measured by the absolute prediction error, we use a Kruskal-Wallis test followed by two post-hoc Conover-Iman tests to inspect bias across race and biological sex. To examine biases in the generated features, we use PCA for dimensionality reduction and employ two-sample Kolmogorov-Smirnov tests to identify distribution shifts among subgroups. Our results reveal statistically significant differences in predictive performance between Black and White, Black and Asian, and male and female subjects. Seven out of twelve pairwise comparisons show statistically significant differences in the feature distributions. Our findings call for further analysis of brain age prediction models.
Authors: Carolina Piçarra, Ben Glocker
Last Update: 2023-09-19 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2309.10835
Source PDF: https://arxiv.org/pdf/2309.10835
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.