Simple Science

Cutting edge science explained simply

# Biology# Genomics

OpenPedCan: Advancing Research on Pediatric Cancer

A project uniting data to improve pediatric cancer treatment options.

― 4 min read


OpenPedCan: PediatricOpenPedCan: PediatricCancer Data Initiativefor childhood cancers.Collecting data to improve treatments
Table of Contents

OpenPedCan is a project at the Children’s Hospital of Philadelphia that aims to collect and analyze data related to pediatric cancer. The goal is to bring together information from various sources to better understand these diseases and to help develop new treatment options. The data is made available to researchers through user-friendly platforms.

Background on Pediatric Cancer Data

Pediatric cancer is a significant health issue, and understanding its genetic and molecular basis is essential for creating effective treatments. The OpenPedCan project works by harmonizing different datasets. This means they take data from various studies and make sure it can be compared and analyzed together, creating a more comprehensive resource for research.

What Data is Included in OpenPedCan?

OpenPedCan includes a diverse set of data types from several research initiatives. Here are some key components:

  1. Kids First Neuroblastoma: A dataset focusing on neuroblastoma, a common childhood cancer.
  2. Kids First PBTA: Part of a larger effort to study brain tumors in children.
  3. Chordoma Foundation: Data gathered to understand chordoma, a type of bone cancer.
  4. MI-ONCOSEQ Study: A clinical study that generates genetic information from tumors.
  5. CPTAC PBTA: Data from the Clinical Proteomic Tumor Analysis Consortium focusing on brain tumors.
  6. CPTAC GBM: Information about glioblastoma, a type of brain cancer.
  7. HOPE Proteomics: A dataset related to high-grade gliomas in young adults.
  8. Open Pediatric Brain Tumor Atlas: A significant resource for understanding various brain tumors in children.

How Data is Collected

The project began by collecting a large volume of genetic and clinical data from patients. This includes information about the tumors, treatments, and outcomes. The data is systematically gathered, ensuring that it is reliable and can be compared across different studies.

Data Harmonization

One of the challenges in cancer research is that data from different studies can be inconsistent. OpenPedCan addresses this through a process called harmonization. This involves standardizing the way data is collected and analyzed so that researchers can combine their findings and draw broader conclusions.

Multi-Omic Data

The project collects what is known as multi-omic data. This refers to data from various biological layers, such as:

  • Genomics: The study of genes and their functions.
  • Transcriptomics: Looking at RNA and how genes are expressed.
  • Proteomics: Analyzing proteins and their roles.
  • Methylation Studies: Understanding how genes are regulated.

By collecting all these different types of data, the project gains a fuller picture of pediatric cancers.

Tools and Methods Used

Data Analysis Workflows

OpenPedCan uses specific workflows to analyze the collected data. This involves using software tools that can process large amounts of data efficiently.

Continuous Integration

The project also utilizes continuous integration, which is a practice in software development that allows for regular updates to be tested and integrated without disrupting ongoing work. This ensures that the analysis remains reproducible and up-to-date.

Project Collaboration

OpenPedCan encourages collaboration among researchers. By allowing contributions, the project not only fosters innovation but also ensures that various perspectives are taken into account. This is crucial in the complex field of cancer research.

Data Access and Distribution

The findings and datasets from OpenPedCan are made available through various online platforms. Researchers can easily access the data, which promotes further studies and helps speed up discoveries in pediatric cancer treatments.

Importance of OpenPedCan

OpenPedCan plays a crucial role in the fight against pediatric cancers by:

  • Providing a comprehensive resource for researchers.
  • Allowing for better comparisons between different studies.
  • Supporting the development of new treatments based on shared knowledge.

Conclusion

In summary, OpenPedCan is an important initiative that aims to improve our understanding of pediatric cancers. By collecting and harmonizing data from various sources, the project strengthens the research community's ability to tackle these diseases. The collaborative nature and open access to data enable scientists to work together, share findings, and ultimately improve care for children facing cancer.

Original Source

Title: The Open Pediatric Cancer Project

Abstract: BackgroundIn 2019, the Open Pediatric Brain Tumor Atlas (OpenPBTA) was created as a global, collaborative open-science initiative to genomically characterize 1,074 pediatric brain tumors and 22 patient-derived cell lines. Here, we extend the OpenPBTA to create the Open Pediatric Cancer (OpenPedCan) Project, a harmonized open-source multi-omic dataset from 6,112 pediatric cancer patients with 7,096 tumor events across more than 100 histologies. Combined with RNA-Seq from the Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA), OpenPedCan contains nearly 48,000 total biospecimens (24,002 tumor and 23,893 normal specimens). FindingsWe utilized Gabriella Miller Kids First (GMKF) workflows to harmonize WGS, WXS, RNA-seq, and Targeted Sequencing datasets to include somatic SNVs, InDels, CNVs, SVs, RNA expression, fusions, and splice variants. We integrated summarized CPTAC whole cell proteomics and phospho-proteomics data, miRNA-Seq data, and have developed a methylation array harmonization workflow to include m-values, beta-vales, and copy number calls. OpenPedCan contains reproducible, dockerized workflows in GitHub, CAVATICA, and Amazon Web Services (AWS) to deliver harmonized and processed data from over 60 scalable modules which can be leveraged both locally and on AWS. The processed data are released in a versioned manner and accessible through CAVATICA or AWS S3 download (from GitHub), and queryable through PedcBioPortal and the NCIs pediatric Molecular Targets Platform. Notably, we have expanded PBTA molecular subtyping to include methylation information to align with the WHO 2021 Central Nervous System Tumor classifications, allowing us to create research-grade integrated diagnoses for these tumors. ConclusionsOpenPedCan data and its reproducible analysis module framework are openly available and can be utilized and/or adapted by researchers to accelerate discovery, validation, and clinical translation.

Authors: Jo Lynne Rokita, Z. Geng, E. Wafula, R. J. Corbett, Y. Zhang, R. Jin, K. S. Gaonkar, S. Shukla, K. S. Rathi, D. Hill, A. Lahiri, D. P. Miller, A. Sickler, K. Keith, C. Blackden, A. Chroni, M. A. Brown, A. A. Kraya, C. J. Koschmann, K. Aldape, X. Huang, B. R. Rood, J. L. Mason, G. R. Trooskin, Z. Abdullaev, P. Wang, Y. Zhu, B. K. Farrow, A. Farrel, J. M. Dybas, C. Zhong, N. Van Kuren, B. Zhang, M. Santi, S. Phul, A. T. Chinwalla, A. C. Resnick, S. J. Diskin, S. Tasian, S. Stefankiewicz, J. M. Maris, B. M. Ennis, M. R. Lueder, A. S. Naqvi, N. Coleman, W. Ma, D Taylor

Last Update: 2024-07-11 00:00:00

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.07.09.599086

Source PDF: https://www.biorxiv.org/content/10.1101/2024.07.09.599086.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from authors

Similar Articles