OpenPedCan: Advancing Research on Pediatric Cancer
A project uniting data to improve pediatric cancer treatment options.
― 4 min read
Table of Contents
- Background on Pediatric Cancer Data
- What Data is Included in OpenPedCan?
- How Data is Collected
- Data Harmonization
- Multi-Omic Data
- Tools and Methods Used
- Data Analysis Workflows
- Continuous Integration
- Project Collaboration
- Data Access and Distribution
- Importance of OpenPedCan
- Conclusion
- Original Source
- Reference Links
OpenPedCan is a project at the Children’s Hospital of Philadelphia that aims to collect and analyze data related to pediatric cancer. The goal is to bring together information from various sources to better understand these diseases and to help develop new treatment options. The data is made available to researchers through user-friendly platforms.
Background on Pediatric Cancer Data
Pediatric cancer is a significant health issue, and understanding its genetic and molecular basis is essential for creating effective treatments. The OpenPedCan project works by harmonizing different datasets. This means they take data from various studies and make sure it can be compared and analyzed together, creating a more comprehensive resource for research.
What Data is Included in OpenPedCan?
OpenPedCan includes a diverse set of data types from several research initiatives. Here are some key components:
- Kids First Neuroblastoma: A dataset focusing on neuroblastoma, a common childhood cancer.
- Kids First PBTA: Part of a larger effort to study brain tumors in children.
- Chordoma Foundation: Data gathered to understand chordoma, a type of bone cancer.
- MI-ONCOSEQ Study: A clinical study that generates genetic information from tumors.
- CPTAC PBTA: Data from the Clinical Proteomic Tumor Analysis Consortium focusing on brain tumors.
- CPTAC GBM: Information about glioblastoma, a type of brain cancer.
- HOPE Proteomics: A dataset related to high-grade gliomas in young adults.
- Open Pediatric Brain Tumor Atlas: A significant resource for understanding various brain tumors in children.
How Data is Collected
The project began by collecting a large volume of genetic and clinical data from patients. This includes information about the tumors, treatments, and outcomes. The data is systematically gathered, ensuring that it is reliable and can be compared across different studies.
Harmonization
DataOne of the challenges in cancer research is that data from different studies can be inconsistent. OpenPedCan addresses this through a process called harmonization. This involves standardizing the way data is collected and analyzed so that researchers can combine their findings and draw broader conclusions.
Multi-Omic Data
The project collects what is known as multi-omic data. This refers to data from various biological layers, such as:
- Genomics: The study of genes and their functions.
- Transcriptomics: Looking at RNA and how genes are expressed.
- Proteomics: Analyzing proteins and their roles.
- Methylation Studies: Understanding how genes are regulated.
By collecting all these different types of data, the project gains a fuller picture of pediatric cancers.
Tools and Methods Used
Data Analysis Workflows
OpenPedCan uses specific workflows to analyze the collected data. This involves using software tools that can process large amounts of data efficiently.
Continuous Integration
The project also utilizes continuous integration, which is a practice in software development that allows for regular updates to be tested and integrated without disrupting ongoing work. This ensures that the analysis remains reproducible and up-to-date.
Project Collaboration
OpenPedCan encourages collaboration among researchers. By allowing contributions, the project not only fosters innovation but also ensures that various perspectives are taken into account. This is crucial in the complex field of cancer research.
Data Access and Distribution
The findings and datasets from OpenPedCan are made available through various online platforms. Researchers can easily access the data, which promotes further studies and helps speed up discoveries in pediatric cancer treatments.
Importance of OpenPedCan
OpenPedCan plays a crucial role in the fight against pediatric cancers by:
- Providing a comprehensive resource for researchers.
- Allowing for better comparisons between different studies.
- Supporting the development of new treatments based on shared knowledge.
Conclusion
In summary, OpenPedCan is an important initiative that aims to improve our understanding of pediatric cancers. By collecting and harmonizing data from various sources, the project strengthens the research community's ability to tackle these diseases. The collaborative nature and open access to data enable scientists to work together, share findings, and ultimately improve care for children facing cancer.
Title: The Open Pediatric Cancer Project
Abstract: BackgroundIn 2019, the Open Pediatric Brain Tumor Atlas (OpenPBTA) was created as a global, collaborative open-science initiative to genomically characterize 1,074 pediatric brain tumors and 22 patient-derived cell lines. Here, we extend the OpenPBTA to create the Open Pediatric Cancer (OpenPedCan) Project, a harmonized open-source multi-omic dataset from 6,112 pediatric cancer patients with 7,096 tumor events across more than 100 histologies. Combined with RNA-Seq from the Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA), OpenPedCan contains nearly 48,000 total biospecimens (24,002 tumor and 23,893 normal specimens). FindingsWe utilized Gabriella Miller Kids First (GMKF) workflows to harmonize WGS, WXS, RNA-seq, and Targeted Sequencing datasets to include somatic SNVs, InDels, CNVs, SVs, RNA expression, fusions, and splice variants. We integrated summarized CPTAC whole cell proteomics and phospho-proteomics data, miRNA-Seq data, and have developed a methylation array harmonization workflow to include m-values, beta-vales, and copy number calls. OpenPedCan contains reproducible, dockerized workflows in GitHub, CAVATICA, and Amazon Web Services (AWS) to deliver harmonized and processed data from over 60 scalable modules which can be leveraged both locally and on AWS. The processed data are released in a versioned manner and accessible through CAVATICA or AWS S3 download (from GitHub), and queryable through PedcBioPortal and the NCIs pediatric Molecular Targets Platform. Notably, we have expanded PBTA molecular subtyping to include methylation information to align with the WHO 2021 Central Nervous System Tumor classifications, allowing us to create research-grade integrated diagnoses for these tumors. ConclusionsOpenPedCan data and its reproducible analysis module framework are openly available and can be utilized and/or adapted by researchers to accelerate discovery, validation, and clinical translation.
Authors: Jo Lynne Rokita, Z. Geng, E. Wafula, R. J. Corbett, Y. Zhang, R. Jin, K. S. Gaonkar, S. Shukla, K. S. Rathi, D. Hill, A. Lahiri, D. P. Miller, A. Sickler, K. Keith, C. Blackden, A. Chroni, M. A. Brown, A. A. Kraya, C. J. Koschmann, K. Aldape, X. Huang, B. R. Rood, J. L. Mason, G. R. Trooskin, Z. Abdullaev, P. Wang, Y. Zhu, B. K. Farrow, A. Farrel, J. M. Dybas, C. Zhong, N. Van Kuren, B. Zhang, M. Santi, S. Phul, A. T. Chinwalla, A. C. Resnick, S. J. Diskin, S. Tasian, S. Stefankiewicz, J. M. Maris, B. M. Ennis, M. R. Lueder, A. S. Naqvi, N. Coleman, W. Ma, D Taylor
Last Update: 2024-07-11 00:00:00
Language: English
Source URL: https://www.biorxiv.org/content/10.1101/2024.07.09.599086
Source PDF: https://www.biorxiv.org/content/10.1101/2024.07.09.599086.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to biorxiv for use of its open access interoperability.