Predicting Tourist Numbers with Deep Learning
This study examines deep learning models for forecasting visitor numbers in tourism.
― 7 min read
Table of Contents
As more people travel and population numbers grow, cultural tourism spots are seeing more visitors. However, the COVID-19 pandemic has changed how these places manage crowds and maintain safety. Social distancing rules and visitor limits have made it crucial for tourist destinations to find ways to keep both urban and natural environments sustainable while preventing long lines and overcrowding.
Tourists’ feelings about health risks and safety can be affected by these distancing measures. To follow the 2030 Agenda for Sustainable Development set by the United Nations, the tourism industry must meet several goals. These include creating sustainable cities, encouraging responsible consumption, and supporting economic growth. Sustainable tourism aims to manage visitor numbers, protect natural sites, cut emissions and waste, promote sustainable energy use, ensure harmony between locals and tourists, and enhance visitor satisfaction for economic benefits.
The lack of reliable data in real-life situations is often due to issues with compliance, data collection, and data sharing. While researchers can use nonpersonal data from points of interest (POIs), tourist facilities, and anonymized digital device data, there’s concern over location data gathered by mobile apps. It is essential to think about whether users know what they are sharing when they use these services, even if the data isn't explicitly personal.
This study looks at what can be achieved using the available data in predicting visitor numbers at tourist spots, recognizing that data scarcity is a common issue. The first step in managing tourist numbers is to accurately predict how visitors move and behave. However, this is difficult because factors like weather, cultural events, holidays, and regional hotspots can influence visitor trends throughout the day. With the growth of large datasets and advanced computational tools, deep neural networks have become one of the top methods for predicting time-series data, including tourist flow.
In this research, we focus on predicting visitor numbers using a local dataset from Salzburg's tourist attractions, along with additional location data about individual tourists. After preparing the data, we compare different deep-learning methods for predicting visitor numbers with ARIMA, a traditional statistical method. ARIMA has been widely used since the 1970s for short-term forecasts, like traffic predictions.
Specific Contributions
Our paper has several key contributions:
- We conduct a thorough comparison of deep learning models and ARIMA, highlighting the strengths and weaknesses of each approach.
- We make hourly visitor predictions for each point of interest, which is essential for managing tourist numbers.
- We evaluate modern deep learning techniques such as Transformers and Graph Neural Networks (GNNs).
- We are the first to apply a wide range of deep learning models to the task of predicting tourist flow.
Related Work
Predicting tourist numbers has become a significant focus of research. Recurrent Neural Networks (RNNs), such as Long Short-Term Memory (LSTM) networks, have been applied to forecast tourist demand. However, most studies have only explored a limited number of models.
Several studies focus on long-term estimates of tourist numbers, measuring demand at city or country levels. For managing visitor flows, it’s crucial to make hourly predictions for specific locations. Time-series predictions often use RNNs, which can remember past information to make better forecasts. Variants of RNNs, like gated RNNs, can perform well, but they face challenges when dealing with irregularly timed data, as is common in tourist flow predictions.
Continuous-time models, such as CT-RNNs and NeuralODEs, use different approaches to define hidden states as solutions to equations. These have useful properties, like adaptive computation and memory-efficient training. Transformer-based models have also gained traction due to their effectiveness in tasks requiring sequence learning. They’ve been applied in time-series forecasting, thanks to their ability to capture long-range correlations through self-attention mechanisms.
GNNs represent a new way of handling data structured as graphs, allowing for better representation of complex relationships in data. Temporal GNNs combine the features of graphs with RNNs and have been effectively used in traffic prediction.
Traditional methods like ARIMA have been commonly used to forecast time-series data. They provide a baseline for evaluating newer deep learning models.
Data
To train the models for predicting tourist numbers, we combined two different datasets to leverage their unique attributes. The first dataset we used is from the "Salzburg Card," which provides access to various attractions. The data includes timestamps for entries to these locations, along with information about weather and holidays in Austria.
We also incorporated mobile phone location data from a third-party service to improve our predictions. This dataset represents around 3% of tourists, giving insights into visitor numbers between points of interest. However, this data is limited and irregular. To address this, we included a street graph from OpenStreetMap, mapping the location data to the nearest nodes and aggregating visitor counts per hour.
During the COVID-19 pandemic, tourism drastically declined in Salzburg, leading to challenges in prediction accuracy when using models trained on pre-pandemic data.
Our dataset included hourly visitor data from attractions and was enriched by geolocation data. Including multiple data sources is crucial for improving real-world predictions. Sparse geolocation data was included in our GNN model as features, allowing for the integration of new data sources in the future.
We then made predictions using various models, comparing their performance comprehensively.
Deep Learning Models
We used a wide range of RNN variations on the dataset to compare the performance of advanced models. This included vanilla RNN, LSTM, phased-LSTM, GRU-D, CT-RNN, CT-LSTM, and Neural-ODE networks. We also utilized a Transformer model to predict tourist flow and applied a basic GNN model to our prediction problem to integrate geolocation data.
We structured our GNN model to use information from the street layout of Salzburg. This had one neuron for each node in the street graph. By combining recurrent kernels with the graph's adjacency matrix, we achieved a model that maintained favorable properties, including differentiable dynamics.
Traditional Methods
In this study, we implemented a non-seasonal ARIMA model. It does not consider seasonal patterns and uses past data to predict future values. We employed the auto.arima function from a forecasting library to automatically determine the best parameters for each point of interest. The ARIMA model was fitted individually for each POI, with new visitor data added continuously to improve ongoing predictions.
The initial experiments used data from 2017 to 2019, focusing on hourly entries to each location and including various features like holidays and weather data. This involved transforming seasonal data and normalization for better performance.
Average Prediction Errors
We evaluated the performance of our models in terms of prediction accuracy and execution time. The results indicated that deep learning models outperformed ARIMA across several metrics, especially when including additional features. Although adding more features did not always lead to better outcomes, the deep learning models showed more flexibility in handling varied data compared to ARIMA.
Main Results
Through a series of experiments, we compared different models, focusing on Mean-Absolute-Error (MAE) and Root-Mean-Squared-Error (RMSE). The deep learning models consistently showed better performance than ARIMA, especially when enriched with external features.
We visually analyzed predictions against actual visitor numbers for selected attractions, revealing that while ARIMA struggled with certain data patterns, deep learning models adapted better to fluctuations in visitor flow. However, there were instances where ARIMA performed comparably well.
Including Geolocation Data
In a second phase of experiments, we expanded our analysis to include geolocation data for individual tourists between 2019 and 2021. In this context, our results suggested that the naive method of using the last known value yielded surprisingly good outcomes, outperforming many advanced models. However, Transformers excelled when using external features, thanks to their ability to manage multiple data inputs effectively.
Our GNN model demonstrated its advantages by using sparse geolocation data, proving useful in real-world scenarios where such data tends to be limited.
Conclusions and Future Work
This study highlighted the effectiveness of deep learning in forecasting tourist flows, emphasizing the value of integrating external features. Deep learning models outperformed traditional methods like ARIMA, with improved speed in making predictions. GNNs showed particular promise in handling sparse geolocation data.
Future work could include developing methods to enhance deep learning performance, exploring the potential of Vector Auto-Regression approaches, and creating specialized models aimed at improving short-term predictions. Ultimately, the goal is to assist stakeholders in making informed decisions that support sustainable tourism efforts.
Title: Prediction of Tourism Flow with Sparse Geolocation Data
Abstract: Modern tourism in the 21st century is facing numerous challenges. Among these the rapidly growing number of tourists visiting space-limited regions like historical cities, museums and bottlenecks such as bridges is one of the biggest. In this context, a proper and accurate prediction of tourism volume and tourism flow within a certain area is important and critical for visitor management tasks such as sustainable treatment of the environment and prevention of overcrowding. Static flow control methods like conventional low-level controllers or limiting access to overcrowded venues could not solve the problem yet. In this paper, we empirically evaluate the performance of state-of-the-art deep-learning methods such as RNNs, GNNs, and Transformers as well as the classic statistical ARIMA method. Granular limited data supplied by a tourism region is extended by exogenous data such as geolocation trajectories of individual tourists, weather and holidays. In the field of visitor flow prediction with sparse data, we are thereby capable of increasing the accuracy of our predictions, incorporating modern input feature handling as well as mapping geolocation data on top of discrete POI data.
Authors: Julian Lemmel, Zahra Babaiee, Marvin Kleinlehner, Ivan Majic, Philipp Neubauer, Johannes Scholz, Radu Grosu, Sophie A. Neubauer
Last Update: 2023-08-28 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2308.14516
Source PDF: https://arxiv.org/pdf/2308.14516
Licence: https://creativecommons.org/licenses/by-nc-sa/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.