Simple Science

Cutting edge science explained simply

# Computer Science# Computation and Language

Transforming Open Knowledge Bases to Closed Ones

A new approach for improving knowledge base accuracy and usability.

― 7 min read


Transforming KnowledgeTransforming KnowledgeBasesusability.New methods enhance data quality and
Table of Contents

Knowledge bases (KBs) are collections of information that help computers understand and process data. They play a crucial role in applications that require knowledge, such as answering questions, generating text, and classifying images. Building these knowledge bases automatically has been a topic of interest for researchers because of the vast amount of information available in texts.

One way to create these knowledge bases is through a method called Open Information Extraction (OpenIE). OpenIE extracts information from text by identifying relationships between different entities. For example, from the sentence "Cats chase mice," OpenIE can extract the relationship: (cats, chase, mice). While OpenIE is useful and can gather a lot of information, it also tends to include errors and ambiguities from the original texts.

The Challenge of Open Knowledge Bases

Open knowledge bases are formed from this extracted information. However, since the information is not always organized in a clear way, working with this data can be difficult. OpenIE creates a variety of relationships that may not have standard forms, making it hard to use this information in applications. In contrast, closed knowledge bases follow a specific structure with clear definitions of relationships, making them easier to work with.

The goal of transforming an open knowledge base into a closed one is to create more precise and usable information. This involves matching the less organized entries in the open KB to a structured framework, like that of an existing closed KB. A popular example of a closed knowledge base is ConceptNet, which provides a clear set of relationships and entities.

Why Transform Open Knowledge Bases?

While open knowledge bases have advantages like high recall (the ability to gather lots of relevant information), they often suffer from issues such as noise and unclear relationships. Transforming an open KB into a closed one can help to produce more reliable data while maintaining the benefits of high recall.

For instance, if an open KB has multiple entries about fish living in water, these can be consolidated into one entry in a closed KB, enhancing clarity and reducing redundancy. The transformed data can then be utilized in various applications, such as question-answering systems and text generation tools.

How the Transformation Works

To convert an open KB into a closed KB, we can think of it as a translation task. This process involves several steps:

  1. Aligning Entries: First, we need to match the entries in the open knowledge base with entries from the closed knowledge base. This alignment helps us see which open triples correspond to which closed triples.

  2. Creating a Dataset: Once we have the alignments, we can create a dataset that helps train a model to perform the mapping. This can be challenging since we want our model to learn to translate information effectively from the open format to the closed format.

  3. Training a Model: A generative language model can be trained to take an open knowledge entry and produce one or several corresponding entries in the closed knowledge format. This model learns how to make these transformations based on the examples in the dataset.

  4. Generating Output: After training, the model can be used to generate final Mappings from the open KB to the closed KB. It's crucial to ensure that the generated information is still closely tied to the original entries.

  5. Ranking the Results: The last step involves assessing the quality of the generated mappings. We aim to rank the results to ensure that the most accurate and relevant information appears at the top.

Advantages of Using Generative Models

Using a generative model has unique benefits. Unlike traditional methods that rely on fixed rules or manual annotation, a generative model can adapt to new and unseen data more effectively. This means it can handle variations in language and structure better than systems that rely solely on rules.

Additionally, the generative model can fix errors in the original open triples. If there are mistakes or unclear statements in the open KB, the model can provide cleaner and more accurate outputs in the closed KB. This cleaning ability is crucial because open knowledge bases often contain inaccuracies.

Previous Approaches

Several methods have been used in the past to tackle the problem of transforming open KBs to closed KBs. Some of these methods include:

  • Manual Mapping: This involves human experts reviewing and translating relationships from the open to the closed format. While accurate, this method is time-consuming and not scalable.

  • Rule-Based Systems: These systems use predefined rules to map relationships. They can be effective but often struggle with the complexity and variability inherent in natural language.

  • Classification Approaches: Some researchers have used machine learning classifiers to predict how open triples correspond to closed triples. While this can work, it often falls short when handling diverse or unseen inputs.

Each of these methods has its drawbacks, motivating the need for a more flexible and efficient approach, such as generative translation.

Generative Translation: A New Approach

The proposed generative translation approach combines the benefits of high recall from open KBs with the precision of closed KBs. This process consists of the following steps:

  1. Data Preparation: Create and refine a dataset for the model to learn from. This involves aligning entries from both open and closed knowledge bases.

  2. Model Training: Fine-tune a generative language model, such as GPT-2, on this dataset. The model learns how to translate between the open and closed formats.

  3. Generating Mappings: Utilize the trained model to generate potential closed triples from the open triples. The model can create multiple outputs, giving a chance to find varied and accurate mappings.

  4. Scoring and Ranking: Evaluate the generated triples based on the frequency of their occurrence in the original open KB and their relevance. This scoring helps determine the best candidates for inclusion in the closed KB.

  5. Finalization: The most relevant and accurate triples are then compiled into the final closed knowledge base that can be used in applications.

Results from the Generative Translation Approach

The generative translation method has demonstrated promising results. It surpasses traditional models in several aspects, including:

  • Higher Recall: The approach can maintain a broad range of information, ensuring that a greater number of triples are captured.

  • Better Precision: The results are cleaner and more structured, leading to less ambiguity in the knowledge base.

  • Flexibility: The generative model can adapt to diverse language constructs, making it suitable for a wide variety of input data.

Evaluating the Quality of Generated Knowledge Bases

After generating the closed KB, it is essential to evaluate its quality. This includes measuring:

  • Correct Mapping: Are the generated triples accurate representations of the original open triples?

  • Truthfulness: Is the information in the generated triples correct?

  • Overall Quality: How does the typicality of the statements in the new KB compare to existing benchmarks?

Human evaluators can assess the quality of the generated triples by reviewing a sample of the data. This manual assessment provides valuable insights into how well the generative model has performed in creating a structured and accurate knowledge base.

Learning from Failures

It's important to recognize that not every attempt at transformation will succeed. Some generated triples may not align with the expected outcomes due to complex relationships or errors in the source data. These failures can offer insights into how the model can be improved.

For example, if certain mappings consistently fail to produce accurate results, researchers can investigate the specific characteristics of these cases and adjust the training process or modify the model architecture accordingly.

Conclusion

Transforming open knowledge bases into closed ones is a significant task that can enhance the usability and accuracy of information. The generative translation approach presents a promising solution, allowing for flexibility, precision, and the ability to clean noisy data.

By leveraging generative language models, researchers and developers can improve the quality of knowledge bases used in various applications, from intelligent question-answering systems to sophisticated text generation tools. As the field continues to evolve, there will be more opportunities to refine these methods, leading to even better outcomes in the future.

Original Source

Title: Mapping and Cleaning Open Commonsense Knowledge Bases with Generative Translation

Abstract: Structured knowledge bases (KBs) are the backbone of many know\-ledge-intensive applications, and their automated construction has received considerable attention. In particular, open information extraction (OpenIE) is often used to induce structure from a text. However, although it allows high recall, the extracted knowledge tends to inherit noise from the sources and the OpenIE algorithm. Besides, OpenIE tuples contain an open-ended, non-canonicalized set of relations, making the extracted knowledge's downstream exploitation harder. In this paper, we study the problem of mapping an open KB into the fixed schema of an existing KB, specifically for the case of commonsense knowledge. We propose approaching the problem by generative translation, i.e., by training a language model to generate fixed-schema assertions from open ones. Experiments show that this approach occupies a sweet spot between traditional manual, rule-based, or classification-based canonicalization and purely generative KB construction like COMET. Moreover, it produces higher mapping accuracy than the former while avoiding the association-based noise of the latter.

Authors: Julien Romero, Simon Razniewski

Last Update: 2023-06-22 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2306.12766

Source PDF: https://arxiv.org/pdf/2306.12766

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles