Leveraging Large Language Models for Knowledge Graph Construction

Table of Contents

What Was the Challenge About?
Related Work on Knowledge Probing
Our Methods Explained
Results from Our Study
Discussion on Wikidata and Knowledge Gaps
Conclusion
Original Source
Reference Links

Large Language Models (LLMs) are changing the way we think about working with information. They can perform various tasks like understanding text, classifying it, and recognizing names. In recent times, models like ChatGPT and GPT-4 from OpenAI have proven to be very effective at these tasks. The main focus has shifted to how we can prompt these models effectively to get the best results.

Knowledge Graphs are a way to represent information that allows machines to understand and reason about facts. However, creating these knowledge graphs is complex, whether done automatically or with human help. Wikidata is one of the largest knowledge graphs available, filled with information about real-world entities, and has been built through the contributions of many people.

While past research has looked into using LLMs for building knowledge graphs, recent improvements in LLMs have sparked renewed interest. Though LLMs hold great potential for knowledge engineering, there are key differences between them and knowledge graphs. Knowledge graphs store facts with strict rules, while LLMs do not always comprehend logical reasoning in the same way.

Additionally, LLMs are mainly trained on publicly available data, leading them to have in-depth knowledge in popular subjects but less information on lesser-known topics. This work aims to shed light on how LLMs can be used for knowledge engineering with a focus on the ISWC 2023 LM-KBC Challenge.

What Was the Challenge About?

The challenge involved predicting object entities based on a subject entity and a relation taken from Wikidata. For example, if the subject is "Robert Bosch LLC" and the relation is "CompanyHasParentOrganisation," the task is to predict the relevant objects, like "Robert Bosch," and link them to their corresponding Wikidata IDs.

To tackle this, we used two top-performing LLMs: gpt-3.5-turbo and GPT-4. By experimenting with different approaches, we achieved a macro-averaged F1 score of 0.701, showing that the performance varied depending on the type of relation being examined. Some relations saw perfect scores, while others didn't fare as well.

Related Work on Knowledge Probing

A lot of research has explored how well LLMs can handle knowledge-intensive tasks. Previous studies have looked at using language models to build or complete knowledge graphs. For example, one early study, LAMA, tried to pull facts from LLMs using a specific prompting format. More recent efforts have further analyzed the use of LLMs for these tasks.

As a result, many new benchmarks and datasets have been created to assess how well LLMs perform on knowledge-related tasks. These benchmarks cover various scenarios, such as answering questions and completing facts, using information from knowledge graphs. LAMA is one of the pioneering datasets, constructed from numerous knowledge sources, and it has inspired further improvements in assessing LLM capabilities.

Our Methods Explained

The task at hand was to predict a set of objects based on a subject and relation. We built a pipeline that involved two main steps: knowledge probing and entity mapping to Wikidata.

Knowledge Probing

In our probing step, we created specific prompt templates to gather knowledge from LLMs. We tested three different setups:

Question Prompting: Here, we asked LLMs direct questions. For example, "Which countries share borders with Brazil?"
Triple Completion Prompting: In this setup, we provided incomplete triples, such as "River Thames, RiverBasinsCountry:" and asked the model to fill in the blanks.
Context-Aided Prompting: In this case, we provided additional context alongside the questions to help the models make better predictions.

When using context, we allowed LLMs to first predict based on their knowledge. Then, we introduced relevant information, prompting them to reevaluate their responses.

In all cases, we included examples to help LLMs understand the expected format of their responses better.

Wikidata Entity Mapping

The next step was to match the predicted object strings to actual entities in Wikidata using an API provided by the platform. We looked for possible matches based on labels and aliases and then carefully selected the correct entities. For this, we developed improved methods to refine the selection process, including:

Case-Based Method: A specific method for handling cases with smaller answer spaces.
Keyword-Based Method: This method looked at descriptions of candidates and matched them to relevant keywords.
Language Model-Based Approach: Here, we built a dictionary of candidate IDs and relied on LLMs to choose the right entity based on more complex distinctions.

Results from Our Study

For our study, we used a dataset from the LM-KBC Challenge, consisting of various relation types covering different domains, like music, geography, and sports. Each set included 1,940 statements for training, validation, and testing.

In our evaluation, GPT-4 outperformed gpt-3.5-turbo. When we allowed the models to use external context in their predictions, it often led to better performance, especially for gpt-3.5-turbo. However, for GPT-4, the added context did not always improve results across the board.

Our observations also indicated that LLMs performed well with relations that had limited domains but struggled with relations that involved broader topics. For instance, they handled "PersonHasNobelPrize" effectively but faced challenges with "PersonHasEmployer," likely due to less information available about many individuals.

Discussion on Wikidata and Knowledge Gaps

While working with Wikidata, we identified issues regarding the quality of the information stored there. Some entities lacked necessary details, and many entries did not follow specific constraints. This lack of completeness signifies the potential for LLMs to assist in improving the quality of Wikidata by suggesting missing information.

Moreover, we found a knowledge gap between Wikipedia and Wikidata, which sometimes led to discrepancies in model performance. For some relations, the information in Wikipedia was more recent or accurate than that in Wikidata. This gap highlights the role LLMs could play in helping to keep data up-to-date.

Conclusion

This work aimed to demonstrate the potential of LLMs in predicting objects for knowledge graphs through the ISWC 2023 LM-KBC Challenge. We achieved notable results, with our best method reaching a score of 0.7007 on average across various relations. While LLMs can be valuable tools for completing knowledge bases, their limitations also underscore the need for human intervention in ensuring data accuracy.

The findings encourage further exploration of how LLMs can work alongside human editors to enhance the quality and completeness of information in knowledge systems.

Leveraging Large Language Models for Knowledge Graph Construction

This study showcases how LLMs can aid in knowledge graph building.

What Was the Challenge About?

Related Work on Knowledge Probing

Our Methods Explained

Knowledge Probing

Wikidata Entity Mapping

Results from Our Study

Discussion on Wikidata and Knowledge Gaps

Conclusion

Reference Links

Referenced Topics

Leveraging Large Language Models for Knowledge Graph Construction

This study showcases how LLMs can aid in knowledge graph building.

#What Was the Challenge About?

#Related Work on Knowledge Probing

#Our Methods Explained

#Knowledge Probing

#Wikidata Entity Mapping

#Results from Our Study

#Discussion on Wikidata and Knowledge Gaps

#Conclusion

Reference Links

Referenced Topics

What Was the Challenge About?

Related Work on Knowledge Probing

Our Methods Explained

Knowledge Probing

Wikidata Entity Mapping

Results from Our Study

Discussion on Wikidata and Knowledge Gaps

Conclusion