Simple Science

Cutting edge science explained simply

# Biology# Genomics

New Advances in Mouse Genome Research

Research improves understanding of mouse genetics and fills gaps in genome.

Thomas M. Keane, B. Francis, L. Gozashti, K. Costello, T. Kasahara, O. S. Harringmeyer, J. Lilue, M. Helmy, T. Kato, A. Czechanski, M. Quail, I. Bonner, E. Dawson, A. F. Smith, L. Reinholdt, D. J. Adams

― 7 min read


Mouse Genome BreakthroughMouse Genome BreakthroughDNA structure.Major progress in understanding mouse
Table of Contents

Mice have been a key part of scientific research for over a century. They help scientists learn about human diseases, find treatments, and understand how our bodies work. Some important discoveries made using mice include the role of certain genes in our immune system and the creation of special stem cells that can turn into different types of cells.

In 2002, scientists completed the first full genome of a mouse, specifically the C57BL/6J strain. The genome is the complete set of DNA in an organism. The mouse genome has 19 pairs of chromosomes and an X chromosome, while the Y chromosome is different in that it has a unique structure. Some parts of the mouse genome are hard to study because of the way its chromosomes are built. The current mouse genome is incomplete, with about 281 gaps. These gaps are present in every chromosome, and important areas like Telomeres and Centromeres remain unresolved.

The Importance of Telomeres and Centromeres

Telomeres are the protective ends of chromosomes that help prevent them from being damaged. Centromeres play an important role during cell division, helping to separate chromosomes correctly. The current mouse genome does not accurately reflect these important parts of the chromosomes.

Recent advancements in DNA sequencing technology offer a way to create more complete mouse genomes. This study used a method that allows scientists to read very long stretches of DNA, providing an opportunity to fill in the gaps in mouse genetics.

Creating T2T Mouse Genomes

Researchers produced the first complete mouse genomes for the C57BL/6J and CAST/EiJ strains. These complete genomes, referred to as T2T (Telomere-to-Telomere), contain all the previously missing structures, including complete telomeres and centromeres. The new genomes are more complete than the old versions, greatly expanding the understanding of mouse genetics.

DNA was obtained from embryonic stem cells from a mix of the two mouse strains. The scientists used both long-read and short-read sequencing methods to assemble the genomes. They created six different genome assemblies and compared them, selecting the best one based on a set of quality measures.

Finding Missing Parts of the Genome

Some chromosomes were still incomplete, lacking telomeric sequences at their ends. To find these missing parts, researchers looked for specific repeated sequences in the unplaced DNA fragments. They also used a method called long-range Hi-C to help assign the remaining sequences to the correct chromosomes.

When comparing the new genomes to the earlier version, it became clear that the new versions had a longer sequence. For example, the C57BL/6J T2T genome added 208 megabases (Mbp) of sequence compared to the earlier version.

Structural Accuracy

To check how accurate the new genome assemblies were, the researchers looked at structures in the chromosomes. They found that the T2T genome had fewer structural variants (SVs) compared to the older genome. Fewer insertions, deletions, duplications, and Inversions were observed in the new version, indicating a more stable structure.

Understanding Chromosome Structure

The new genomes allowed for a comparison between the structure of chromosomes. It was noted that the new genomes contained full representations of both telomeres and centromeres on all chromosomes. The gaps in the earlier assembly were filled with new data, revealing large-scale differences between the two strains of mice.

The researchers discovered that telomeres and centromeres had been poorly represented in the old genome, and the new assemblies greatly improved this representation. The new genomes show substantial increases in satellite sequences, which are important for chromosome structure and function.

Gene Annotation and Novel Genes

Gene annotation was performed using RNA sequencing from various tissue types. The number of protein-coding genes found in the new genomes was comparable to the previous reference genome. However, the researchers identified new genes that had not been seen before.

The study found several novel genes in both strains. These novel genes ranged in size and contained many different exons, which are parts of genes that encode for proteins. Some of these new genes were found to be similar to known proteins, hinting at their potential functions.

Increased Copy Number Genes

The researchers also found a number of genes that had an increased copy number in the T2T genomes compared to the previous version. This means that some genes were present in greater amounts in the new genomes. These genes fell into various categories, including those related to the immune system. Differences in the number of copies of certain genes were also noted between the two strains.

Structure of Telomeres and Centromeres

Telomeres and centromeres play vital roles in the stability of chromosomes. The new genomes showed a significant improvement in the representation of these regions. The C57BL/6J and CAST/EiJ T2T genomes had much longer telomeres than the previous reference genome.

The researchers found that telomere lengths were generally longer in mice compared to humans. The study detailed the size and structure of telomeres and centromeres, highlighting differences between the two mouse strains.

The centromeres in mice were found to consist of specific types of satellite DNA, which had been poorly characterized in previous studies. The new genomes allowed for a clearer understanding of the centromeric regions, revealing their structural complexity.

Comparing Telocentric Chromosome Ends

Mouse chromosomes are unique in that they have their centromeres located at the very end of the chromosomes, a structure known as telocentric. The researchers compared the telocentric structures in both mouse strains, revealing differences in their arrangement of repeat sequences.

The C57BL/6J strains showed a distinct repeat organization, while CAST/EiJ had a more variable organization. This highlights the diversity of chromosome structures between the two strains.

Completing the Mouse Reference Genome

Despite previous attempts, the mouse genome reference remains incomplete. The T2T C57BL/6J assembly successfully filled in many gaps, adding a significant amount of sequence to the mouse genome. The work revealed a total of 301 protein-coding genes in the previously missing regions.

The researchers highlighted specific regions of interest, such as those related to immune response, allowing for future studies on their function. This comprehensive approach to gap filling significantly enhances the overall quality of the mouse genome.

Pseudoautosomal Regions

The study also focused on a region called the pseudoautosomal region (PAR), which is shared between the X and Y chromosomes. The new assemblies improved the understanding of the PAR and revealed many new genes and structural features.

By comparing the PAR between the two mouse strains, the researchers noted differences in gene content and structure, indicating that even small regions of the genome can exhibit significant variation.

Inversions in the Mouse Genome

Inversions are rearrangements in the genome that can impact gene function and regulation. This study identified numerous inversions between the two mouse strains, shedding light on their origins and the role of repeated sequences in creating these structural changes.

The researchers found that many inversions were associated with large repeated segments, suggesting that these segments may play a role in the evolution and function of the genome.

KRAB Zinc Finger Proteins

KRAB zinc finger proteins (KZFPs) are critical for regulating gene expression. The new assemblies greatly improved the coverage of these proteins, allowing for a better understanding of their roles in the genome. Differences in the number and arrangement of KZFP families were found between the two strains, indicating that these regions are subject to evolutionary changes.

Conclusion

This study marks a significant advancement in mouse genetics. By creating more complete genomes for two key strains, researchers have opened the door for new investigations into genetics and disease. These improved genomes provide a clearer picture of previously missing regions, enhancing our understanding of how these regions function and evolve.

The work offers a foundation for future studies that will explore genetic variations and their implications for health and disease. As researchers continue to build on this work, more insights into the complexities of genetics will emerge, ultimately benefiting our understanding of biological processes and the potential for developing treatments for diseases.

Original Source

Title: The structural diversity of telomeres and centromeres across mouse subspecies revealed by complete assemblies

Abstract: It is over twenty years since the publication of the C57BL/6J mouse reference genome, which has been a key catalyst for understanding mammalian disease biology. However, the mouse reference genome still lacks telomeres and centromeres, contains 281 chromosomal sequence gaps, and only partially represents many biomedically relevant loci. We present the first T2T mouse genomes for two key inbred strains, C57BL/6J and CAST/EiJ. These T2T genomes reveal significant variability in telomere and centromere sizes and structural organisation. We add an additional 213 Mbp of novel sequence to the reference genome containing 517 protein-coding genes. We examined two important but incomplete loci in the mouse genome - the pseudoautosomal region (PAR) on the sex chromosomes and KRAB zinc finger proteins (KZFPs) loci. We identified distant locations of the PAR boundary, different copy number and sizes of segmental duplications, and a multitude of amino acid substitution mutations in PAR genes.

Authors: Thomas M. Keane, B. Francis, L. Gozashti, K. Costello, T. Kasahara, O. S. Harringmeyer, J. Lilue, M. Helmy, T. Kato, A. Czechanski, M. Quail, I. Bonner, E. Dawson, A. F. Smith, L. Reinholdt, D. J. Adams

Last Update: 2024-10-26 00:00:00

Language: English

Source URL: https://www.biorxiv.org/content/10.1101/2024.10.24.619615

Source PDF: https://www.biorxiv.org/content/10.1101/2024.10.24.619615.full.pdf

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to biorxiv for use of its open access interoperability.

More from authors

Similar Articles