The Ongoing Quest for Missing Human Genes
Scientists continue to search for uncharted territories in the human genome.
― 6 min read
Table of Contents
- Why Are There Still Missing Pieces?
- New Actors in the Protein Show
- Peering into the Unknown
- The Trouble with Protein Detection
- A Treasure Hunt in PeptideAtlas
- The Case of the Non-Human Proteins
- The Quest for Novel Genes
- The Struggle with Validity
- The Oddities of Aberrant Proteins
- Conclusion: The Ever-Expanding Gene Map
- Original Source
The human Genome, which is like a blueprint for our entire biology, has made significant progress in terms of mapping out our Genes. Some important parts of the genome have been neatly filled in, particularly the tricky sections known as heterochromatic regions and the Y chromosome, thanks to a group called the T2T consortium. However, there are still many gaps left in the catalog of human genes - think of it as a puzzle that still has some pieces missing.
Why Are There Still Missing Pieces?
The reason for the incomplete gene list is a bit like a family feud between reference databases. These databases are supposed to tell us which genes actually make Proteins, but they don't always agree. Some estimates say we could have between 19,000 and 35,000 protein-coding genes, but the latest counts are down to just over 19,000. It’s a bit like counting sheep, only to find that some have jumped over the fence and disappeared.
Interestingly, there's new evidence coming from ribosome profiling, which is a fancy way of studying how proteins are made. This research hints that there might be a staggering number of unrecognized protein-making regions lurking in our genome, with some reports suggesting there could be as many as 7,000 new contenders that could increase the number of known protein-coding genes by about 30%. It’s like finding a bonus level in a video game that you didn’t know existed!
New Actors in the Protein Show
In the realm of these potential new genes, we have some notable characters like APELA, MIURF, and MYMX. These names might sound like a band lineup, but they represent new types of genes that researchers are keeping an eye on. The key point connecting these genes is not their length - some are surprisingly lengthy - but their ability to be traced back through evolution. However, it’s also true that most of the newly discovered genes don’t have this evolutionary track record, which means they could be a bit of a mystery.
Peering into the Unknown
A remarkable observation is that many of the genes that researchers are trying to identify might not be as important as they first appeared. Many could simply be changes over time that don’t really contribute anything essential to our biology. This situation leads to some amusing thoughts about how often scientists might be barking up the wrong tree when trying to recognize novel proteins.
The Trouble with Protein Detection
When trying to make sense of this complex protein puzzle, researchers have looked into something called proteomics, which studies proteins on a grand scale. Unfortunately, many newly identified proteins aren’t showing up in proteomics findings, which raises eyebrows. If we think these proteins are real, we should see some solid evidence for them. Yet, a recent study only found a handful of matches.
One potential reason proteins aren’t being captured could be plumbing issues within the scientific detection process. Smaller proteins or even those with unusual amino acids might slip through the cracks. Or, maybe the proteins are being produced but are disappearing just as quickly due to degradation, kind of like that sock that always goes missing in the dryer.
A Treasure Hunt in PeptideAtlas
To find out more about these elusive proteins, scientists employed a resource called PeptideAtlas, which acts like a treasure map to previously hidden proteins. By combing through this database, researchers hoped to detect proteins that had somehow missed the main cataloging effort.
After filtering through tons of data, they found a treasure trove of over 13,000 novel Peptides, or protein snippets, that were not mapped to any known genes. However, the reality is that many of these peptides turned out to be variations of proteins that are already known. So, while it felt like finding a new island on the map, it was more like discovering a slightly altered version of an island you already knew.
The Case of the Non-Human Proteins
In a funny twist, researchers also stumbled upon proteins that shouldn’t be there at all - proteins from fruit flies, mice, and even bacteria! This accidental mix-up can be likened to finding a shrunken woolly mammoth in your fridge - completely out of place. How did this happen? It seems there was some cross-contamination during experiments, likely due to researchers unintentionally mixing samples.
The Quest for Novel Genes
After bypassing the distractions of wrong turns and detours, researchers focused on about 34 potential new protein-coding genes that were completely missing from the main catalog. Some of these genes appear to have credible evidence supporting their existence, while others seem to be the results of past errors or random events.
One particular candidate, GBA3, has raised eyebrows because it has the characteristics of a protein, yet it also carries a frame shift that suggests it should not be functional. It’s a bit like trying to read a book where some pages are missing!
The Struggle with Validity
The journey through the PeptideAtlas database is not just about collecting data but also about validation. Researchers sift through these entries to determine whether they represent genuine proteins, misclassified variants, or even remnants of old biological pathways. This process is much like a detective story, with researchers piecing together clues to determine the truth behind each entry.
After careful consideration, it seems many entries are likely misidentified proteins or remnants of proteins that no longer play a role in human physiology. Some are truly intriguing discoveries, while others appear to be product of errors in gene annotation that have lingered too long.
The Oddities of Aberrant Proteins
Even more curious are the proteins that seem to pop up only in Cancer cells. It’s like finding a secret club of proteins that only meet under abnormal circumstances. Many of these entries appear to suggest that they could be products of aberrant translation, or simply abnormal variations that crop up due to the chaotic nature of cancer cells.
Conclusion: The Ever-Expanding Gene Map
In the end, the search for human genes is a winding road filled with stops, starts, and plenty of quirky detours. Some discoveries hold real promise, while others might just be a case of mistaken identity. As we continue to dig deeper into our genetic makeup, every new finding could reshape our understanding of what it means to be human. It’s an exciting time in genetics, akin to being on the cusp of discovering a new continent - only instead of land, we’re unearthing the intricate web of life that makes us who we are.
And who knows? The next twist in this genomic tale could reveal a whole new layer of complexity - or a whole new cast of characters that make up our biological story. The adventure continues!
Title: A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation
Abstract: The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.
Authors: Jose Manuel Rodriguez, Miguel Maquedano, Daniel Cerdan-Velez, Enrique Calvo, Jesús Vazquez, Michael L. Tress
Last Update: Nov 15, 2024
Language: English
Source URL: https://www.biorxiv.org/content/10.1101/2024.11.14.623419
Source PDF: https://www.biorxiv.org/content/10.1101/2024.11.14.623419.full.pdf
Licence: https://creativecommons.org/licenses/by-nc/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to biorxiv for use of its open access interoperability.