Simple Science

Cutting edge science explained simply

# Computer Science# Computation and Language# Artificial Intelligence

Leveraging Large Language Models for Knowledge Graph Construction

This study showcases how LLMs can aid in knowledge graph building.

― 6 min read


LLMs in Knowledge GraphsLLMs in Knowledge Graphsbase predictions.Examining LLMs' potential in knowledge
Table of Contents

Large Language Models (LLMs) are changing the way we think about working with information. They can perform various tasks like understanding text, classifying it, and recognizing names. In recent times, models like ChatGPT and GPT-4 from OpenAI have proven to be very effective at these tasks. The main focus has shifted to how we can prompt these models effectively to get the best results.

Knowledge Graphs are a way to represent information that allows machines to understand and reason about facts. However, creating these knowledge graphs is complex, whether done automatically or with human help. Wikidata is one of the largest knowledge graphs available, filled with information about real-world entities, and has been built through the contributions of many people.

While past research has looked into using LLMs for building knowledge graphs, recent improvements in LLMs have sparked renewed interest. Though LLMs hold great potential for knowledge engineering, there are key differences between them and knowledge graphs. Knowledge graphs store facts with strict rules, while LLMs do not always comprehend logical reasoning in the same way.

Additionally, LLMs are mainly trained on publicly available data, leading them to have in-depth knowledge in popular subjects but less information on lesser-known topics. This work aims to shed light on how LLMs can be used for knowledge engineering with a focus on the ISWC 2023 LM-KBC Challenge.

What Was the Challenge About?

The challenge involved predicting object entities based on a subject entity and a relation taken from Wikidata. For example, if the subject is "Robert Bosch LLC" and the relation is "CompanyHasParentOrganisation," the task is to predict the relevant objects, like "Robert Bosch," and link them to their corresponding Wikidata IDs.

To tackle this, we used two top-performing LLMs: gpt-3.5-turbo and GPT-4. By experimenting with different approaches, we achieved a macro-averaged F1 score of 0.701, showing that the performance varied depending on the type of relation being examined. Some relations saw perfect scores, while others didn't fare as well.

Related Work on Knowledge Probing

A lot of research has explored how well LLMs can handle knowledge-intensive tasks. Previous studies have looked at using language models to build or complete knowledge graphs. For example, one early study, LAMA, tried to pull facts from LLMs using a specific prompting format. More recent efforts have further analyzed the use of LLMs for these tasks.

As a result, many new benchmarks and datasets have been created to assess how well LLMs perform on knowledge-related tasks. These benchmarks cover various scenarios, such as answering questions and completing facts, using information from knowledge graphs. LAMA is one of the pioneering datasets, constructed from numerous knowledge sources, and it has inspired further improvements in assessing LLM capabilities.

Our Methods Explained

The task at hand was to predict a set of objects based on a subject and relation. We built a pipeline that involved two main steps: knowledge probing and entity mapping to Wikidata.

Knowledge Probing

In our probing step, we created specific prompt templates to gather knowledge from LLMs. We tested three different setups:

  1. Question Prompting: Here, we asked LLMs direct questions. For example, "Which countries share borders with Brazil?"

  2. Triple Completion Prompting: In this setup, we provided incomplete triples, such as "River Thames, RiverBasinsCountry:" and asked the model to fill in the blanks.

  3. Context-Aided Prompting: In this case, we provided additional context alongside the questions to help the models make better predictions.

When using context, we allowed LLMs to first predict based on their knowledge. Then, we introduced relevant information, prompting them to reevaluate their responses.

In all cases, we included examples to help LLMs understand the expected format of their responses better.

Wikidata Entity Mapping

The next step was to match the predicted object strings to actual entities in Wikidata using an API provided by the platform. We looked for possible matches based on labels and aliases and then carefully selected the correct entities. For this, we developed improved methods to refine the selection process, including:

  • Case-Based Method: A specific method for handling cases with smaller answer spaces.

  • Keyword-Based Method: This method looked at descriptions of candidates and matched them to relevant keywords.

  • Language Model-Based Approach: Here, we built a dictionary of candidate IDs and relied on LLMs to choose the right entity based on more complex distinctions.

Results from Our Study

For our study, we used a dataset from the LM-KBC Challenge, consisting of various relation types covering different domains, like music, geography, and sports. Each set included 1,940 statements for training, validation, and testing.

In our evaluation, GPT-4 outperformed gpt-3.5-turbo. When we allowed the models to use external context in their predictions, it often led to better performance, especially for gpt-3.5-turbo. However, for GPT-4, the added context did not always improve results across the board.

Our observations also indicated that LLMs performed well with relations that had limited domains but struggled with relations that involved broader topics. For instance, they handled "PersonHasNobelPrize" effectively but faced challenges with "PersonHasEmployer," likely due to less information available about many individuals.

Discussion on Wikidata and Knowledge Gaps

While working with Wikidata, we identified issues regarding the quality of the information stored there. Some entities lacked necessary details, and many entries did not follow specific constraints. This lack of completeness signifies the potential for LLMs to assist in improving the quality of Wikidata by suggesting missing information.

Moreover, we found a knowledge gap between Wikipedia and Wikidata, which sometimes led to discrepancies in model performance. For some relations, the information in Wikipedia was more recent or accurate than that in Wikidata. This gap highlights the role LLMs could play in helping to keep data up-to-date.

Conclusion

This work aimed to demonstrate the potential of LLMs in predicting objects for knowledge graphs through the ISWC 2023 LM-KBC Challenge. We achieved notable results, with our best method reaching a score of 0.7007 on average across various relations. While LLMs can be valuable tools for completing knowledge bases, their limitations also underscore the need for human intervention in ensuring data accuracy.

The findings encourage further exploration of how LLMs can work alongside human editors to enhance the quality and completeness of information in knowledge systems.

Original Source

Title: Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata

Abstract: In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE.

Authors: Bohui Zhang, Ioannis Reklos, Nitisha Jain, Albert Meroño Peñuela, Elena Simperl

Last Update: 2023-09-15 00:00:00

Language: English

Source URL: https://arxiv.org/abs/2309.08491

Source PDF: https://arxiv.org/pdf/2309.08491

Licence: https://creativecommons.org/licenses/by/4.0/

Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.

Thank you to arxiv for use of its open access interoperability.

More from authors

Similar Articles