Using Mobility Data for Disease Modeling
How mobility data and privacy measures impact disease spread predictions.
― 6 min read
Table of Contents
- What is Mobility Data?
- The Role of Mobility Data in Disease Modeling
- Privacy and Data Sharing
- Applying Differential Privacy to Disease Modeling
- Examining Mobility Data
- Scenarios Analyzed
- Connectivity and Disease Dynamics
- Variations in Epidemic Characteristics
- Metapopulation Models and Their Importance
- Future Directions
- Conclusion
- Original Source
The use of mobile phone data has been on the rise in areas like Public Health, city planning, and dealing with natural disasters for over ten years. The COVID-19 pandemic pushed this trend even further, as authorities needed to track movements and make decisions about travel restrictions and lockdowns. During the pandemic, mobility data helped model how the virus spread, allowing experts to monitor or predict the transmission of COVID-19.
What is Mobility Data?
Mobility data from mobile phones show how people's movements change over time. This data helps us understand how people interact, see where cases of diseases originate, and forecast how a virus might spread geographically. Researchers analyze this data, often collected for billing or through digital platforms, to gain insights into human behaviors. For instance, mobility patterns have been used to study the seasonal patterns of diseases like dengue and rubella in countries like Pakistan and Kenya.
The Role of Mobility Data in Disease Modeling
During the COVID-19 pandemic, mobility data became crucial. Researchers used it to create models that showed how human movement affected the spread of the virus. These models also predicted how the epidemic would unfold and estimated how effective measures like lockdowns and social distancing would be.
Despite the usefulness of these datasets, privacy remains a big concern. Even when data is deidentified and aggregated, people worry about their personal information being used without their consent. Currently, there aren't standardized agreements or guidelines to ensure privacy while also benefiting public health efforts.
Privacy and Data Sharing
As technology continues to evolve, the amount of data available grows quickly. This makes it easier for companies or individuals to re-identify data previously considered anonymous. To address these privacy issues, various frameworks have been developed. One noteworthy approach is called Differential Privacy (DP), which helps balance privacy and the utility of the data.
DP works by adding random noise to data, which makes it harder for someone to identify specific individuals from the dataset. This way, researchers can still obtain useful information while protecting individual privacy. The concept of DP is being adopted by many companies and government agencies, but how it should be applied to mobility data in disease modeling is still not fully clear.
Applying Differential Privacy to Disease Modeling
In this article, we look into how DP can be used in models that predict the spread of infectious diseases. We examine how different levels of noise affect important features of an epidemic by running simulations. Our method builds on an existing model and analyzes how adding noise impacts key outputs related to disease spread.
We used real mobility data from New York State during the early COVID-19 pandemic to showcase how applying differential privacy can influence certain metrics associated with disease spread. Our findings suggest that using DP may alter some estimates but can still provide significant privacy protection.
Examining Mobility Data
The mobility data we analyzed was collected from around August 15 to November 15, 2020. This data included a large number of transitions (more than 800,000) between counties in New York, averaging about 9,000 transitions each day. The number of transitions varied greatly between counties, with some having very few and others having many. After adding DP, the total number of reported transitions changed, but the general ranking of travel routes remained consistent.
Scenarios Analyzed
We looked at several scenarios to see how adding noise through DP affected the spread of diseases. Our first scenario considered starting outbreaks in both large and small counties. Large counties, like Kings and Queens, have populations of around 2 million, while the smallest counties, like Allegany and Essex, have populations around 46,000 and 37,000, respectively.
When starting an outbreak in large counties, we saw that it began around day 50 and peaked around day 75, with about 1% of the population infected. In the smaller counties, however, the outbreak started later, around day 60, and peaked around day 150, affecting about 5% of the population.
We tested various combinations of scenarios and noise levels over 1,000 iterations. When the epidemic started in larger counties, metrics related to the size of the outbreak and the number of counties with cases were higher than in smaller counties. However, when we introduced significant noise, many of the initial estimates dropped, while rates for spread and other factors increased.
Connectivity and Disease Dynamics
Next, we looked at how the level of connectivity between counties affected the predictions. We simulated outbreaks in pairs of counties with either low, medium, or no connectivity to a larger county (Kings County). In areas with low connectivity, the number of infections remained quite low, while medium connectivity displayed a slightly increased infection rate.
We found that increased noise generally led to a decline in important metrics about the disease spread. This pattern continued across various scenarios, including when outbreaks started in smaller counties and in counties that were poorly connected within the mobility network.
Variations in Epidemic Characteristics
To delve deeper into the nature of spreading diseases, we tested several potential changes in the trajectory of outbreaks in Kings and Queens counties. We looked at how a faster epidemic spread due to increased transmission rates affected the metrics. We also considered scenarios with high versus low numbers of asymptomatic individuals.
When we increased the transmission rate, the peak of the epidemic came much sooner, and more individuals got infected than in previous simulations. However, as noise increased, the metrics showed either conservative behaviors or more erratic results.
Metapopulation Models and Their Importance
Throughout the COVID-19 pandemic, researchers developed various metapopulation models to inform decisions, anticipate the disease's spread, and identify weaknesses in healthcare systems. Mobility data has been vital in these models, providing insights into different geographical and behavioral factors among populations. However, there is still concern that such data could potentially reveal specific individuals' travel behaviors, which is why privacy measures are critical.
Our research indicates that when metapopulation models utilize mobility data, applying noise can help maintain the validity of important metrics across various noise levels. Up to a certain point, the addition of noise appears to protect individual privacy while still allowing for good estimates of public health metrics.
Future Directions
While our findings focus on a specific combination of mobility data and modeling techniques, they present a flexible framework. This can help researchers assess the trade-off between privacy and accuracy as mobility data becomes more widely used in disease modeling.
As a next step, researchers could use our findings to evaluate how different levels of privacy-preserving noise affect their specific models and data sets. This could ultimately contribute to better privacy protection while ensuring public health data remains useful for analysis.
Conclusion
The use of mobile phone data presents both opportunities and challenges in fields like public health and epidemiology. While these data sets can reveal important insights, especially during Epidemics, it is crucial to ensure the privacy of individuals. Differential privacy offers a promising solution by providing a method to analyze mobility data while protecting personal information. As methodologies evolve, the balance between privacy and utility will remain an essential consideration for researchers and decision-makers alike.
Title: A standardised differential privacy framework for epidemiological modelling with mobile phone data
Abstract: During the COVID-19 pandemic, the use of mobile phone data for monitoring human mobility patterns has become increasingly common, both to study the impact of travel restrictions on population movement and epidemiological modelling. Despite the importance of these data, the use of location information to guide public policy can raise issues of privacy and ethical use. Studies have shown that simple aggregation does not protect the privacy of an individual, and there are no universal standards for aggregation that guarantee anonymity. Newer methods, such as differential privacy, can provide statistically verifiable protection against identifiability but have been largely untested as inputs for compartment models used in infectious disease epidemiology. Our study examines the application of differential privacy as an anonymisation tool in epidemiological models, studying the impact of adding quantifiable statistical noise to mobile phone-based location data on the bias of ten common epidemiological metrics. We find that many epidemiological metrics are preserved and remain close to their non-private values when the true noise state is less than 20, in a count transition matrix, which corresponds to a privacy-less parameter[isin] = 0.05 per release. We show that differential privacy offers a robust approach to preserving individual privacy in mobility data while providing useful population-level insights for public health. Importantly, we have built a modular software pipeline to facilitate the replication and expansion of our framework. Author SummaryHuman mobility data has been used broadly in epidemiological population models to better understand the transmission dynamics of an epidemic, predict its future trajectory, and evaluate potential interventions. The availability and use of these data inherently raises the question of how we can balance individual privacy and the statistical utility of these data. Unfortunately, there are few existing frameworks that allow us to quantify this trade-off. Here, we have developed a framework to implement a differential privacy layer on top of human mobility data which can guarantee a minimum level of privacy protection and evaluate their effects on the statistical utility of model outputs. We show that this set of models and their outputs are resilient to high levels of privacy-preserving noise and suggest a standard privacy threshold with an epsilon of 0.05. Finally, we provide a reproducible framework for public health researchers and data providers to evaluate varying levels of privacy-preserving noise in human mobility data inputs, models, and epidemiological outputs.
Authors: Nishant Kishore, M. K. Savi, A. Yadav, W. Zhang, N. Vembar, A. Schroeder, S. Balsari, C. O. Buckee, S. Vadhan
Last Update: 2023-03-23 00:00:00
Language: English
Source URL: https://www.medrxiv.org/content/10.1101/2023.03.16.23287382
Source PDF: https://www.medrxiv.org/content/10.1101/2023.03.16.23287382.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to medrxiv for use of its open access interoperability.