Spaces:
Running
Running
# TextGraphs: raw texts, LLMs, and KGs, oh my! | |
<img src="assets/logo.png" width="113" alt="illustration of a lemma graph"/> | |
Welcome to the **TextGraphs** library... | |
- demo: <https://huggingface.co/spaces/DerwenAI/textgraphs> | |
- code: <https://github.com/DerwenAI/textgraphs> | |
- biblio: <https://derwen.ai/docs/txg/biblio> | |
- DOI: 10.5281/zenodo.10431783 | |
## Overview | |
_Explore uses of large language models (LLMs) in semi-automated knowledge graph (KG) construction from unstructured text sources, with human-in-the-loop (HITL) affordances to incorporate guidance from domain experts._ | |
What is "generative AI" in the context of working with knowledge graphs? | |
Initial attempts tend to fit a simple pattern based on _prompt engineering_: present text sources to a LLM-based chat interface, asking to generate an entire graph. | |
This is generally expensive and results are often poor. | |
Moreover, the lack of controls or curation in this approach represents a serious disconnect with how KGs get curated to represent an organization's domain expertise. | |
Can the definition of "generative" be reformulated for KGs? | |
Instead of trying to use a fully-automated "black box", what if it were possible to generate _composable elements_ which then get aggregated into a KG? | |
Some research in topological analysis of graphs indicates potential ways to decompose graphs, which can then be re-composed probabilistically. | |
While the mathematics may be sound, these techniques need to be understood in the context of a full range of tasks within KG-construction workflows to assess how they can apply for real-world graph data. | |
This project explores the use of LLM-augmented components within natural language workflows, focusing on small well-defined tasks within the scope of KG construction. | |
To address challenges in this problem, this project considers improved means of tokenization, for handling input. | |
In addition, a range of methods are considered for filtering and selecting elements of the output stream, re-composing them into KGs. | |
This has a side-effect of providing steps toward better pattern identification and variable abstraction layers for graph data, for _graph levels of detail_ (GLOD). | |
Many papers aim to evaluate benchmarks, in contrast this line of inquiry focuses on integration: | |
means of combining multiple complementary research projects; | |
how to evaluate the outcomes of other projects to assess their potential usefulness in production-quality libraries; | |
and suggested directions for improving the LLM-based components of NLP workflows used to construct KGs. | |
## Index Terms | |
_natural language processing_, | |
_knowledge graph construction_, | |
_large language models_, | |
_entity extraction_, | |
_entity linking_, | |
_relation extraction_, | |
_semantic random walk_, | |
_human-in-the-loop_, | |
_topological decomposition of graphs_, | |
_graph levels of detail_, | |
_network motifs_, | |