New Dataset Reveals Trends in Vertebrate Diversity
Research dataset on diversification metrics for five groups of vertebrates released.
― 5 min read
Table of Contents
Research on biodiversity has mostly been dominated by wealthier countries. This is largely due to their access to money and technology. However, more researchers and institutions are pushing for open science policies. These policies aim to make scientific knowledge available to everyone, especially in regions that lack financial support or technological resources. By sharing data and research more freely, science can progress and benefit a wider audience.
Over the last few years, data papers have become quite common in scientific journals. These papers present significant amounts of data, which can help other researchers who need such information but are not focused on collecting it themselves. This can save time and reduce costs for researchers. A great example of this is the WorldClim dataset, which has been cited over 20,000 times since its release.
This study aims to provide a dataset on diversification metrics for five major groups of vertebrates. It includes matrices that show the presence or absence of species. Generating this data requires considerable computing power and some advanced programming skills, making this dataset quite valuable for those interested in the fields of macroecology and macroevolution. The data was gathered during an analysis driven by curiosity, and the intention is to share it with the community. A brief description of the dataset will be provided, but a detailed analysis of the reasons behind the observed biodiversity trends won’t be offered.
Data Collection
Phylogenetic Sampling
This project used fully-sampled, time-calibrated super-trees for five groups of vertebrates: amphibians, reptiles, birds, mammals, and sharks. These trees were collected from various sources and are available online. Because these phylogenies are combined from species with DNA and those without, they can create data that is not always straightforward. To account for this uncertainty, 100 random trees were selected for each group for the analysis.
Diversification Rates
To compare how different groups evolve over time, two different methods for calculating diversification rates were used. One is called DivRate, which estimates the rate of speciation in a straightforward way. It measures how many species emerge from a common ancestor. The other method, BAMM, uses a more complex approach that allows for different rates of speciation and extinction over time.
In the analyses, 100 trees for each vertebrate group were studied using both methods. Due to the heavy computational demands, the analysis was carried out on a high-performance server. The results were then processed with specific software to ensure accuracy. It took about six months to run all analyses because of the large amounts of data involved.
Presence-Absence Matrices
To organize the data further, presence-absence matrices were created using distribution polygons from the IUCN Red List, a well-known resource for biodiversity data. For amphibians, reptiles, mammals, and sharks, the most recent dataset was used. For birds, an earlier dataset was referenced. Invalid geometries of the polygons were resolved using GIS software before generating the presence-absence matrices.
Taxonomic Harmonization
To ensure consistency, unique identifiers from the Global Biodiversity Information Facility (GBIF) were included for each species. These identifiers help track species regardless of name changes over time. The matching of species names was done using a built-in tool from GBIF that attempts to align different names through exact matches or slight differences. Some species were unable to be matched, so additional measures were used to ensure more accurate matching of species names.
Results
Based on the collected data, the diversification rates for each vertebrate group showed similar slow rates, with a general trend observed. However, different metrics provided varying results, with BAMM displaying less variation. Despite uncertainties in the data, speciation rates remained consistent across two of the three metrics analyzed.
The presence-absence matrices successfully identified patterns of biodiversity, showing higher species richness in tropical regions. There were slight variations in species richness across different regions, reflecting the known patterns of diversity. In total, a considerable number of species were common between the datasets.
Geographical Patterns
The geographical patterns of diversification rates can vary depending on the method used. For instance, the BAMM method indicated that amphibians had higher speciation rates in specific regions, while the DivRate method showed different trends. Similarly, reptiles showed varied results based on the metric used. For mammals, BAMM indicated higher rates in marine species across tropical regions, while DivRate reflected elevated rates in certain continental regions.
The patterns for birds showed a more even distribution of speciation rates, but peaks were noted in specific areas. For sharks, the BAMM method produced lower overall rates, while DivRate indicated higher rates in oceanic regions.
Discussion
While this dataset is not the first of its kind, it offers a well-organized compilation of data that is openly available for research. It can provide insights into patterns of macroecology and macroevolution, which are important for understanding how different species evolve over time. The dataset allows researchers to link diversification rates with geographical or climatic data to look for patterns at a broader scale.
However, it is essential to consider the right-skewed nature of the data when analyzing it. Taking averages could lead to misleading interpretations. Instead, using medians or relative measures might yield more accurate results. Furthermore, the way data is presented influences its interpretation, especially when the variation range is small.
It’s crucial to keep in mind that the metrics used might differ in their taxonomic classifications due to various factors, including publication dates. Researchers are encouraged to harmonize the species taxonomy when using this dataset to ensure accuracy. The unique IDs provided can significantly help in this regard, making it easier to match species between different datasets.
Conclusion
This dataset on diversification metrics for vertebrates serves as a valuable resource for researchers. With proper care in analyzing and interpreting the data, it can yield insights into biodiversity patterns and the processes driving them. The aim is to facilitate research and encourage further exploration in the fields of macroecology and macroevolution. Users are encouraged to familiarize themselves with the methodologies and interpretations of the data to maximize its potential use in future studies.
Title: A dataset containing speciation rates, uncertainty, and presence-absence matrices for more than 34,000 vertebrate species
Abstract: The increasing availability of phylogenetic data has facilitated the exploration of macroecological and macroevolutionary patterns across diverse spatial and temporal scales. However, calculating some model-based diversification metrics often requires significant computational power and time. To address this, I present a comprehensive dataset of Bayesian diversification rates for over 34,000 vertebrate species, spanning five major groups: amphibians, birds, mammals, reptiles, and sharks. This is the first large-scale dataset of its kind, providing both continuous and binary data for speciation rates and presence-absence matrices, respectively, with global coverage. The dataset not only enables analyses of evolutionary and spatial diversity patterns but also democratizes access to data-intensive studies. Additionally, as it is based on Markov chains, the dataset can be customized and extended without the need to start from scratch, offering flexibility for future research on diversification dynamics.
Authors: Juan Daniel Vasquez-Restrepo
Last Update: 2024-12-02 00:00:00
Language: English
Source URL: https://www.biorxiv.org/content/10.1101/2024.04.09.588748
Source PDF: https://www.biorxiv.org/content/10.1101/2024.04.09.588748.full.pdf
Licence: https://creativecommons.org/licenses/by-nc/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to biorxiv for use of its open access interoperability.