Organizing the World of Biomedical Data
Learn how ontologies structure biological information for better research.
Anita R. Caron, Aleix Puig-Barbe, Ellen M. Quardokus, James P. Balhoff, Jasmine Belfiore, Nana-Jane Chipampe, Josef Hardi, Bruce W. Herr II, Huseyin Kir, Paola Roncaglia, Mark A. Musen, James A. McLaughlin, Katy Börner, David Osumi-Sutherland
― 9 min read
Table of Contents
- The Structure of Ontologies
- The Gene Ontology Example
- Complex Relationships and Navigation
- Simplifying Complexity
- Informal Annotation in Atlases
- Challenges and Solutions
- Resident Immune Cells and Their Complications
- The Role of Data Validation
- Automated Analysis Pipelines
- Generating Simplified Views
- Communities and Collaborations
- The Advantages of Ontologies
- Limitations of Table-Based Approaches
- Alternative Approaches
- Conclusion: Navigating the Biological Maze
- Original Source
- Reference Links
When scientists talk about biomedical ontologies, they are referring to a structured way to categorize and label different kinds of biological data. Think of it like organizing your messy garage with labeled boxes. Each box contains items that are similar or related, making it easier to find what you need later on. In this case, the "items" are terms that describe biological entities, like genes, proteins, or diseases.
The idea behind using these organized structures is to ensure that data can be easily found, accessed, understood, and reused. This is known by the acronym Fair, which stands for Findable, Accessible, Interoperable, and Reusable. It's a bit like ensuring your garage is not only clean but that you can share it with friends and they can find their way around without bumping into things.
The Structure of Ontologies
Biomedical ontologies have a clear sense of hierarchy, similar to how a family tree branches out. At the top, you might find broad categories such as "Cells," and as you go down, you get more specific types. For instance, under "Cells," you could find "Neurons," and further down, types like "Motor Neurons."
To keep things organized, every term in an ontology has a definition that can be referenced. This ensures that everyone is speaking the same language. It’s like having a universal dictionary for biology terms. If one researcher says "B-cell," everyone knows exactly what they mean.
Additionally, these terms are given unique identifiers, like social security numbers but for biological concepts. This helps different datasets to talk to each other, allowing for better collaboration among scientists.
Gene Ontology Example
TheOne particularly famous ontology is the Gene Ontology (GO). This tool classifies genes based on their functions, where they are located in the cell, and what biological processes they are part of. It's widely used to analyze gene data from experiments. Imagine trying to find a specific book in a library without a catalog. That's what researchers would face without something like GO.
Complex Relationships and Navigation
Ontologies are not just about lists and definitions; they also map out relationships between terms. These relationships are like connecting dots on a map. For example, if "enzyme activity" refers to a specific function, and "kinase activity" is a more specific type of enzyme activity, the relationship between them helps scientists understand how they fit together in the grand scheme of things.
All these relationships create a complex graph that shows how different entities relate to one another. This helps researchers find meaningful patterns and make connections in their data, much like piecing together a jigsaw puzzle.
Simplifying Complexity
As useful as these ontologies are, they can get pretty complicated over time. Imagine adding new boxes to your garage without throwing out the old ones. Eventually, you might end up with a room full of boxes, and it becomes hard to find anything.
Researchers often face this issue. As ontologies expand, they can become more challenging to navigate. Different scientific communities have unique needs, so the original structure might not fit everyone’s purposes. Think of it as trying to fit a square peg into a round hole.
To deal with this complexity, researchers need simplified views of ontologies, tailored to fit their specific needs. This is like saying, "I don’t need the entire garage; I just need the box labeled 'Garden Tools.'"
Informal Annotation in Atlases
In addition to structured ontologies, scientists also create informal systems to annotate anatomical and cell type atlases. Think of atlases as big picture guides to biological data. They often use a simpler hierarchical arrangement of terms that allow users to browse related content easily.
Different projects, like the Allen Brain Atlas or the Human Lung Cell Atlas, use these simpler hierarchies to organize data based on expert opinions or existing information. They often share these hierarchies in spreadsheet formats, which is a common practice in biology. Imagine a giant spreadsheet where each row represents a different type of cell in your body, making it easy to see what’s what at a glance.
Challenges and Solutions
Despite the convenience of these informal hierarchies, they can still have limitations. The biggest issue is that they might not always align with more formal ontologies, leading to inconsistencies. It’s like if your garage boxes have different labels than the catalog you wrote when you first organized things.
Improving the structure of these informal systems can enhance their organization. By validating these hierarchies against standard ontologies, researchers can create a more reliable framework. It’s like checking your grocery list against what’s actually in your kitchen.
Resident Immune Cells and Their Complications
Something interesting arises when trying to categorize immune cells in tissues. After all, every organ has its immune cells. Some of these cells are residents, while others come and go like unwelcome house guests. The challenge lies in distinguishing between these cell types and ensuring the ontologies reflect this accurately.
For instance, if you’re collecting data about immune cells in the kidney, you want to ensure you’re only focusing on the resident cells. Mixing up resident and non-resident cells could skew results and lead to misinterpretations. It’s like trying to identify who lives in your house when you have a party going on with friends coming and going.
Data Validation
The Role ofData validation is the process of checking whether the relationships defined in these hierarchies are accurate according to established ontologies. In this case, researchers use tools to automatically test the relationships between terms in their databases. If something doesn’t line up, it’s flagged for further investigation.
To facilitate this, researchers developed validation pipelines to regularly check their data against established structures like Uberon and the Cell Ontology. It’s like sending a friend into your garage to ensure everything is in its right place every week. If something’s not right, you’ll know it needs addressing.
Automated Analysis Pipelines
Automated analysis pipelines take in data from tables and check the validity of relationships. They generate reports on what works and what doesn't, helping researchers improve their terms and connections. It simplifies the maintenance of large datasets, allowing for quicker updates and less manual checking.
For example, if the pipeline finds a relationship between "renal corpuscle" and "kidney" that doesn’t match with what's documented in the standard ontology, it can suggest corrections. This keeps the data accurate and up to date, like having a regular decluttering session in your garage.
Generating Simplified Views
When scientists want to share their findings, they often need a cleaner, more straightforward representation of complex ontologies. Using tools that generate simplified views helps them take a large, tangled web of information and distill it into a more user-friendly format.
These simplified views allow for more accessible browsing and searching, making it easier for researchers to find what they need without getting lost in all the complexity. It’s like having a shortcut to your favorite snack in a well-organized kitchen.
Communities and Collaborations
Community collaboration is crucial in scientific research. Different groups work together to refine ontologies and improve their quality. Shared tools and resources help them achieve better results, allowing for easier integration of new data.
Tools that facilitate validation, like the ones mentioned earlier, encourage these collaborative efforts. Researchers can work together to address discrepancies and streamline data organization, ensuring that everyone is on the same page.
The Advantages of Ontologies
Using ontologies for data annotation comes with numerous benefits. They provide a structured way to organize information, allowing researchers to easily group annotations in meaningful ways. For example, if you wanted to study kidney function, you could quickly gather all related data from various sources using the ontology as a guide.
Additionally, ontologies allow for better communication between researchers. When everyone is using the same language and structure, collaboration becomes simpler and more effective. It's like finally agreeing on a common set of rules for a board game, making it easier to play together.
Limitations of Table-Based Approaches
While table-based approaches can be useful, they also have limitations. Simple hierarchical structures may not reflect complex biological relationships accurately, leading to oversimplifications. For example, if you categorize immune cells only based on their location, you may miss important information about their interactions.
Moreover, tables often don’t capture the richness of multiple relationships that entities may share. In biology, things are rarely black and white; they're often shades of gray. Just like your relationship with dessert—it’s complicated!
Alternative Approaches
One alternative to table-based approaches is to use more formal ontological structures that allow for multiple inheritance. This way, you can acknowledge that an entity might belong to several categories at once. For example, a cell might be part of the kidney anatomy but also participate in immune response.
Such approaches require the expertise to navigate complex relationships but can lead to more accurate and robust representations of biological knowledge. It’s akin to having a fantastic GPS that gives you various routes to reach your destination, rather than a one-size-fits-all map.
Conclusion: Navigating the Biological Maze
Navigating the world of biomedical data is no small task. With ontologies, researchers can organize and analyze complex information effectively. However, they have to deal with continual changes and expansions, leading to increased complexity.
Simplifying views and using validation tools can help maintain clarity and accuracy, ensuring that scientists can make the most of the data at their disposal. It’s like keeping a clean, organized kitchen ready for the next big baking session. As science grows and evolves, so too will the structures that help organize it, making it easier for everyone to find what they need in the ever-bustling world of biological research.
Original Source
Title: A general strategy for generating expert-guided, simplified views of ontologies
Abstract: Annotation with widely used, well-structured ontologies, combined with the use of ontology-aware software tools, ensures data and analyses are Findable, Accessible, Interoperable and Reusable (FAIR). Standardized terms with synonyms support lexical search. Ontology structure supports biologically meaningful grouping of annotations (typically by location and type). However, there are significant barriers to the adoption and use of ontologies by researchers and resource developers. One barrier is complexity. Ontologies serving diverse communities are often more complex than needed for individual applications. It is common for atlases to attempt their own simplifications by manually constructing hierarchies of terms linked to ontologies, but these typically include relationship types that are not suitable for grouping annotations. Here, we present a suite of tools for validating user hierarchies against ontology structure, using them to generate graphical reports for discussion and ontology views tailored to the needs of the HuBMAP Human Reference Atlas, and the Human Developmental Cell Atlas. In both cases, validation is a source of corrections and content for both ontologies and user hierarchies.
Authors: Anita R. Caron, Aleix Puig-Barbe, Ellen M. Quardokus, James P. Balhoff, Jasmine Belfiore, Nana-Jane Chipampe, Josef Hardi, Bruce W. Herr II, Huseyin Kir, Paola Roncaglia, Mark A. Musen, James A. McLaughlin, Katy Börner, David Osumi-Sutherland
Last Update: 2024-12-18 00:00:00
Language: English
Source URL: https://www.biorxiv.org/content/10.1101/2024.12.13.628309
Source PDF: https://www.biorxiv.org/content/10.1101/2024.12.13.628309.full.pdf
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to biorxiv for use of its open access interoperability.
Reference Links
- https://grlc.io/api/INCAtools/ubergraph/sparql/#/default/get_cell_by_location
- https://hubmapconsortium.github.io/ccf-validation-tools/
- https://apps.humanatlas.io/asctb-api/
- https://github.com/INCATools/verificado
- https://github.com/hubmapconsortium/ubergraph2asct
- https://github.com/hubmapconsortium/validation-template
- https://pypi.org/project/ubergraph2asct/
- https://github.com/INCATools/obographviz