docs/objectives.md · DerwenAI/textgraphs at main

Consider three classes of composable elements which are needed for constructing KGs: nodes, edges, properties. Several areas of machine learning (ML) research can be leveraged to generate these elements from unstructured text sources:

nodes: NER, node prediction
edges: relation extraction (RE), semantic inference, link prediction
properties: NLP parse, entity linking, graph analytics

Weights or probabilities from the analysis can also be used to construct gradients for ranking each class of elements in the generated output. This supports multiple approaches for filtering, selection, and abstraction of the generated composable elements, and helps incorporate domain expertise.

A set of questions follows from this line of inquiry:

RQ1: can workflows be defined which integrate LLM-based components and generate composable elements for KGs, while managing the quality of the generated results?

RQ2: can topological analysis and decomposition of graph data help inform better ways to generating graph elements, e.g., by leveraging patterns within graphs (network motifs) and graph abstraction layers?

RQ3: where might it be possible to improve data quality for -- training data, benchmarks, evals, etc. -- then iterate to train more effective LLM-based components?

RQ4: how can consistent evaluations of open source related to ML research be made, assessing opportunities for reusing code in production-quality libraries?