Simple Science

Cutting edge science explained simply

# Biology# Genomics

CADD: A Tool for Genetic Health Insight

CADD helps identify harmful genetic changes across species.

― 6 min read


CADD: Genetic ChangeCADD: Genetic ChangeAnalysishealth management.Evaluating genetic variants for animal
Table of Contents

CADD stands for Combined Annotation Dependent Depletion. Quite a mouthful, huh? But it's basically a fancy way to figure out if changes in our DNA could be harmful or not. It's like having a super-smart friend who helps you decide if that weird-looking fruit is actually edible or if it will send you running for the bathroom.

Why Do We Care About Genetic Changes?

In our DNA, there are many tiny changes called Variants. These can happen naturally and may not have any effect on a person's health. But some variants might lead to diseases or other health issues. Knowing which changes are bad can help doctors and researchers find better treatments and understand how to keep us all healthier.

How Does CADD Work?

CADD uses a machine learning Model to take a close look at these variants. Think of machine learning as a very clever robot that learns from past data. This robot looks at a ton of information about our genes and their characteristics. It figures out which variants are likely to be harmless and which ones could cause trouble.

Instead of just using a few known examples of harmful or harmless variants, CADD learns from a lot of data, which gives it a better chance of being right. It looks at variants that have been around for a while to see which ones seem to get along just fine with the rest of our genetic makeup.

What’s New in CADD?

CADD was initially developed for humans but has since been adapted for other animals. It’s been applied to mice, chickens, and even pigs. Why? Because researchers want to use this knowledge for livestock and other species too. It’s like making a great recipe and then tweaking it to suit different tastes or dietary needs.

Now, thanks to advances in science, we have more high-quality genetic data available. This means we can set up an automated system to create CADD Scores for more species quickly and accurately.

The CADD Workflow Simplified

Here’s how the whole CADD process works, broken down into steps:

  1. Get the Ancestral Sequence: First, we need to know what the "old" version of our DNA looked like before changes occurred. This gives us a baseline.

  2. Create Variants: Next, we generate both harmless and potentially harmful variants based on this ancestral sequence. It’s like spotting the differences in a puzzle.

  3. Annotate Variants: At this stage, the variants are labeled with various features that help us understand their significance. These labels are based on data from previous studies.

  4. Train the CADD Model: We teach the model to distinguish between harmful and harmless variants using all the information collected.

  5. Generate CADD Scores: Finally, the model assigns scores to every possible change in the sequence. These scores help researchers quickly figure out which variants are worth investigating further.

Getting Down to Details

When we talk about the variants, there are two main categories: benign (harmless) and deleterious (harmful). The benign variants are like your friend who always shows up on time for dinner-reliable and not causing any trouble. The deleterious variants, on the other hand, are like the friend who brings the fruitcake no one wants to eat-still hanging around but best to avoid!

To figure out these categories, the model looks at how these variants have evolved in the past. For example, if a change is very common in a population or has been around for ages, it's likely harmless. However, some variants are created in the lab without any natural history, and these are often the ones that might be more harmful.

The Pipeline Magic

This CADD process is carried out using a system called Snakemake, which automates a lot of the work. Think of it as having a personal assistant who organizes your life so you don’t have to juggle everything yourself.

The entire process is quite flexible. If you want to tweak how the scores are calculated or change the data used, you can do that based on your needs. Why not, right? It's better than having to do everything manually!

Chicken and Turkey CADD Scores

The latest updates to CADD have been applied to chickens and turkeys. Researchers built a new model specifically for these birds to help farmers and scientists understand their genetics better.

In the process of building these scores, a large set of variant scores was created for both chickens and turkeys using the updated reference genomes. It’s like creating a family tree but for genetic variants-lots of branching paths and connections!

They looked at about 47 million genetic variations in chickens and around 68 million in turkeys. After training the model, the researchers found that it performed much better than previous versions. It's like upgrading from a bicycle to a sports car!

Importance of Annotations

Now, what good is a score without context? That’s where annotations come in. Annotations provide helpful background information about the variants. They can tell us whether a variant is found in an important part of the gene or if it connects to other factors that might influence health.

These annotations can come from databases that track all kinds of genetic info. They can include everything from how often a certain variant appears in a population to its potential effects on protein production. Basically, it’s like getting a report card for each variant.

Scoring the Variants

CADD scores are scaled in a way that’s easy to understand, kind of like grading your final exam. Higher scores indicate a greater likelihood of a variant being harmful. The scoring formula is designed to make it simple to see which variants need further investigation.

For instance, if you find a variant with a high score, it might be worth more scrutiny, much like how you would pay more attention to a test answer that makes no sense.

The Big Picture

This CADD approach doesn’t just stop at chickens and turkeys. It’s a flexible process that can be applied to any species. This means researchers can quickly and effectively prioritize which genetic changes to study more closely, making their work easier and faster.

The result? A more efficient system for understanding genetic variants that could affect the health of various species. Whether it's livestock or wild animals, this tool helps ensure that scientists can keep tabs on genetic changes that matter.

Conclusion

CADD may have started as a tool for humans, but it has grown to be a valuable resource for many species, including our feathered friends. With a clever combination of genetic data, machine learning, and automation, researchers are paving the way for better understanding and management of genetic health in animals.

So next time you think about DNA, remember it's not just a series of letters-it's a complex puzzle. And with tools like CADD, we’re getting closer to solving it, one variant at a time!

Original Source

Title: A generic pipeline for CADD score generation: chickenCADD and turkeyCADD

Abstract: Combined Annotation Dependent Depletion (CADD) is a machine learning approach used to predict the deleteriousness of genetic variants across a genome. By integrating diverse genomic features, CADD assigns a PHRED-like rank score to each potential variant. Unlike other methods, CADD does not rely on limited datasets of known pathogenic or benign variants but uses larger and less biased training sets. The rapid increase in high-quality genomes and functional annotations across species highlights the need for an automated, non-species-specific pipeline to generate CADD scores. Here, we introduce such a pipeline, facilitating the generation of CADD scores for various species using only a high-quality genome with gene annotation and a multi-species alignment. Additionally, we present updated chickenCADD scores and newly generated turkeyCADD scores, both generated with the pipeline.

Authors: K. Lensing, JGC. van Schipstal, D. de Ridder, MAM. Groenen, MFL. Derks

Last Update: Nov 3, 2024

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.11.01.621569

Source PDF: https://www.biorxiv.org/content/10.1101/2024.11.01.621569.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

Similar Articles