# Reference: `textgraphs` package | |
<img src='../assets/nouns/api.png' alt='API by Adnen Kadri from the Noun Project' /> | |
Package definitions for the `TextGraphs` library. | |
see copyright/license | |
## [`TextGraphs` class](#TextGraphs) | |
Construct a _lemma graph_ from the unstructured text source, | |
then extract ranked phrases using a `textgraph` algorithm. | |
--- | |
#### [`infer_relations_async` method](#textgraphs.TextGraphs.infer_relations_async) | |
[*\[source\]*]( | |
```python | |
infer_relations_async(pipe, debug=False) | |
``` | |
Gather triples representing inferred relations and build edges, | |
concurrently by running an async queue. | |
<> | |
Make sure to call beforehand: `TextGraphs.collect_graph_elements()` | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for this document | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.List[textgraphs.elem.Edge]` | |
a list of the inferred `Edge` objects | |
--- | |
#### [`__init__` method](#textgraphs.TextGraphs.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(factory=None, iri_base="") | |
``` | |
Constructor. | |
* `factory` : `typing.Optional[textgraphs.pipe.PipelineFactory]` | |
optional `PipelineFactory` used to configure components | |
--- | |
#### [`create_pipeline` method](#textgraphs.TextGraphs.create_pipeline) | |
[*\[source\]*]( | |
```python | |
create_pipeline(text_input) | |
``` | |
Use the pipeline factory to create a pipeline (e.g., `spaCy.Document`) | |
for each text input, which are typically paragraph-length. | |
* `text_input` : `str` | |
raw text to be parsed by this pipeline | |
* *returns* : `textgraphs.pipe.Pipeline` | |
a configured pipeline | |
--- | |
#### [`create_render` method](#textgraphs.TextGraphs.create_render) | |
[*\[source\]*]( | |
```python | |
create_render() | |
``` | |
Create an object for rendering the graph in `PyVis` HTML+JavaScript. | |
* *returns* : `textgraphs.vis.RenderPyVis` | |
a configured `RenderPyVis` object for generating graph visualizations | |
--- | |
#### [`collect_graph_elements` method](#textgraphs.TextGraphs.collect_graph_elements) | |
[*\[source\]*]( | |
```python | |
collect_graph_elements(pipe, text_id=0, para_id=0, debug=False) | |
``` | |
Collect the elements of a _lemma graph_ from the results of running | |
the `textgraph` algorithm. These elements include: parse dependencies, | |
lemmas, entities, and noun chunks. | |
Make sure to call beforehand: `TextGraphs.create_pipeline()` | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for this document | |
* `text_id` : `int` | |
text (top-level document) identifier | |
* `para_id` : `int` | |
paragraph identitifer | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`construct_lemma_graph` method](#textgraphs.TextGraphs.construct_lemma_graph) | |
[*\[source\]*]( | |
```python | |
construct_lemma_graph(debug=False) | |
``` | |
Construct the base level of the _lemma graph_ from the collected | |
elements. This gets represented in `NetworkX` as a directed graph | |
with parallel edges. | |
Make sure to call beforehand: `TextGraphs.collect_graph_elements()` | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`perform_entity_linking` method](#textgraphs.TextGraphs.perform_entity_linking) | |
[*\[source\]*]( | |
```python | |
perform_entity_linking(pipe, debug=False) | |
``` | |
Perform _entity linking_ based on the `KnowledgeGraph` object. | |
Make sure to call beforehand: `TextGraphs.collect_graph_elements()` | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for this document | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`infer_relations` method](#textgraphs.TextGraphs.infer_relations) | |
[*\[source\]*]( | |
```python | |
infer_relations(pipe, debug=False) | |
``` | |
Gather triples representing inferred relations and build edges. | |
Make sure to call beforehand: `TextGraphs.collect_graph_elements()` | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for this document | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.List[textgraphs.elem.Edge]` | |
a list of the inferred `Edge` objects | |
--- | |
#### [`calc_phrase_ranks` method](#textgraphs.TextGraphs.calc_phrase_ranks) | |
[*\[source\]*]( | |
```python | |
calc_phrase_ranks(pr_alpha=0.85, debug=False) | |
``` | |
Calculate the weights for each node in the _lemma graph_, then | |
stack-rank the nodes so that entities have priority over lemmas. | |
Phrase ranks are normalized to sum to 1.0 and these now represent | |
the ranked entities extracted from the document. | |
Make sure to call beforehand: `TextGraphs.construct_lemma_graph()` | |
* `pr_alpha` : `float` | |
optional `alpha` parameter for the PageRank algorithm | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`get_phrases` method](#textgraphs.TextGraphs.get_phrases) | |
[*\[source\]*]( | |
```python | |
get_phrases() | |
``` | |
Return the entities extracted from the document. | |
Make sure to call beforehand: `TextGraphs.calc_phrase_ranks()` | |
* *yields* : | |
extracted entities | |
--- | |
#### [`get_phrases_as_df` method](#textgraphs.TextGraphs.get_phrases_as_df) | |
[*\[source\]*]( | |
```python | |
get_phrases_as_df() | |
``` | |
Return the ranked extracted entities as a dataframe. | |
Make sure to call beforehand: `TextGraphs.calc_phrase_ranks()` | |
* *returns* : `pandas.core.frame.DataFrame` | |
a `pandas.DataFrame` of the extracted entities | |
--- | |
#### [`export_rdf` method](#textgraphs.TextGraphs.export_rdf) | |
[*\[source\]*]( | |
```python | |
export_rdf(lang="en") | |
``` | |
Extract the entities and relations which have IRIs as RDF triples. | |
* `lang` : `str` | |
language identifier | |
* *returns* : `str` | |
RDF triples N3 (Turtle) format as a string | |
--- | |
#### [`denormalize_iri` method](#textgraphs.TextGraphs.denormalize_iri) | |
[*\[source\]*]( | |
```python | |
denormalize_iri(uri_ref) | |
``` | |
Discern between a parsed entity and a linked entity. | |
* *returns* : `str` | |
_lemma_key_ for a parsed entity, the full IRI for a linked entity | |
--- | |
#### [`load_bootstrap_ttl` method](#textgraphs.TextGraphs.load_bootstrap_ttl) | |
[*\[source\]*]( | |
```python | |
load_bootstrap_ttl(ttl_str, debug=False) | |
``` | |
Parse a TTL string with an RDF semantic graph representation to load | |
bootstrap definitions for the _lemma graph_ prior to parsing, e.g., | |
for synonyms. | |
* `ttl_str` : `str` | |
RDF triples in TTL (Turtle/N3) format | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`export_kuzu` method](#textgraphs.TextGraphs.export_kuzu) | |
[*\[source\]*]( | |
```python | |
export_kuzu(zip_name="", debug=False) | |
``` | |
Export a labeled property graph for KùzuDB (openCypher). | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `str` | |
name of the generated ZIP file | |
## [`SimpleGraph` class](#SimpleGraph) | |
An in-memory graph used to build a `MultiDiGraph` in NetworkX. | |
--- | |
#### [`__init__` method](#textgraphs.SimpleGraph.__init__) | |
[*\[source\]*]( | |
```python | |
__init__() | |
``` | |
Constructor. | |
--- | |
#### [`reset` method](#textgraphs.SimpleGraph.reset) | |
[*\[source\]*]( | |
```python | |
reset() | |
``` | |
Re-initialize the data structures, resetting all but the configuration. | |
--- | |
#### [`make_node` method](#textgraphs.SimpleGraph.make_node) | |
[*\[source\]*]( | |
```python | |
make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True) | |
``` | |
Lookup and return a `Node` object. | |
By default, link matching keys into the same node. | |
Otherwise instantiate a new node if it does not exist already. | |
* `tokens` : `typing.List[textgraphs.elem.Node]` | |
list of parsed tokens | |
* `key` : `str` | |
lemma key (invariant) | |
* `span` : `spacy.tokens.token.Token` | |
token span for the parsed entity | |
* `kind` : `<enum 'NodeEnum'>` | |
the kind of this `Node` object | |
* `text_id` : `int` | |
text (top-level document) identifier | |
* `para_id` : `int` | |
paragraph identitifer | |
* `sent_id` : `int` | |
sentence identifier | |
* `label` : `typing.Optional[str]` | |
node label (for a new object) | |
* `length` : `int` | |
length of token span | |
* `linked` : `bool` | |
flag for whether this links to an entity | |
* *returns* : `textgraphs.elem.Node` | |
the constructed `Node` object | |
--- | |
#### [`make_edge` method](#textgraphs.SimpleGraph.make_edge) | |
[*\[source\]*]( | |
```python | |
make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False) | |
``` | |
Lookup an edge, creating a new one if it does not exist already, | |
and increment the count if it does. | |
* `src_node` : `textgraphs.elem.Node` | |
source node in the triple | |
* `dst_node` : `textgraphs.elem.Node` | |
destination node in the triple | |
* `kind` : `<enum 'RelEnum'>` | |
the kind of this `Edge` object | |
* `rel` : `str` | |
relation label | |
* `prob` : `float` | |
probability of this `Edge` within the graph | |
* `key` : `typing.Optional[str]` | |
lemma key (invariant); generate a key if this is not provided | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Optional[textgraphs.elem.Edge]` | |
the constructed `Edge` object; this may be `None` if the input parameters indicate skipping the edge | |
--- | |
#### [`dump_lemma_graph` method](#textgraphs.SimpleGraph.dump_lemma_graph) | |
[*\[source\]*]( | |
```python | |
dump_lemma_graph() | |
``` | |
Dump the _lemma graph_ as a JSON string in _node-link_ format, | |
suitable for serialization and subsequent use in JavaScript, | |
Neo4j, Graphistry, etc. | |
Make sure to call beforehand: `TextGraphs.calc_phrase_ranks()` | |
* *returns* : `str` | |
a JSON representation of the exported _lemma graph_ in | |
--- | |
#### [`load_lemma_graph` method](#textgraphs.SimpleGraph.load_lemma_graph) | |
[*\[source\]*]( | |
```python | |
load_lemma_graph(json_str, debug=False) | |
``` | |
Load from a JSON string in | |
a JSON representation of the exported _lemma graph_ in | |
[_node-link_]( | |
format | |
* `debug` : `bool` | |
debugging flag | |
## [`Node` class](#Node) | |
A data class representing one node, i.e., an extracted phrase. | |
--- | |
#### [`__repr__` method](#textgraphs.Node.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
--- | |
#### [`get_linked_label` method](#textgraphs.Node.get_linked_label) | |
[*\[source\]*]( | |
```python | |
get_linked_label() | |
``` | |
When this node has a linked entity, return that IRI. | |
Otherwise return its `label` value. | |
* *returns* : `typing.Optional[str]` | |
a label for the linked entity | |
--- | |
#### [`get_name` method](#textgraphs.Node.get_name) | |
[*\[source\]*]( | |
```python | |
get_name() | |
``` | |
Return a brief name for the graphical depiction of this Node. | |
* *returns* : `str` | |
brief label to be used in a graph | |
--- | |
#### [`get_stacked_count` method](#textgraphs.Node.get_stacked_count) | |
[*\[source\]*]( | |
```python | |
get_stacked_count() | |
``` | |
Return a modified count, to redact verbs and linked entities from | |
the stack-rank partitions. | |
* *returns* : `int` | |
count, used for re-ranking extracted entities | |
--- | |
#### [`get_pos` method](#textgraphs.Node.get_pos) | |
[*\[source\]*]( | |
```python | |
get_pos() | |
``` | |
Generate a position span for `OpenNRE`. | |
* *returns* : `typing.Tuple[int, int]` | |
a position span needed for `OpenNRE` relation extraction | |
## [`Edge` class](#Edge) | |
A data class representing an edge between two nodes. | |
--- | |
#### [`__repr__` method](#textgraphs.Edge.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
## [`EnumBase` class](#EnumBase) | |
A mixin for Enum codecs. | |
## [`NodeEnum` class](#NodeEnum) | |
Enumeration for the kinds of node categories | |
## [`RelEnum` class](#RelEnum) | |
Enumeration for the kinds of edge relations | |
## [`PipelineFactory` class](#PipelineFactory) | |
Factory pattern for building a pipeline, which is one of the more | |
expensive operations with `spaCy` | |
--- | |
#### [`__init__` method](#textgraphs.PipelineFactory.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[]) | |
``` | |
Constructor which instantiates the `spaCy` pipelines: | |
* `tok_pipe` -- regular generator for parsed tokens | |
* `ner_pipe` -- with entities merged | |
* `aux_pipe` -- spotlight entity linking | |
which will be needed for parsing and entity linking. | |
* `spacy_model` : `str` | |
the specific model to use in `spaCy` pipelines | |
* `ner` : `typing.Optional[textgraphs.pipe.Component]` | |
optional custom NER component | |
* `kg` : `textgraphs.pipe.KnowledgeGraph` | |
knowledge graph used for entity linking | |
* `infer_rels` : `typing.List[textgraphs.pipe.InferRel]` | |
a list of components for inferring relations | |
--- | |
#### [`create_pipeline` method](#textgraphs.PipelineFactory.create_pipeline) | |
[*\[source\]*]( | |
```python | |
create_pipeline(text_input) | |
``` | |
Instantiate the document pipelines needed to parse the input text. | |
* `text_input` : `str` | |
raw text to be parsed | |
* *returns* : `textgraphs.pipe.Pipeline` | |
a configured `Pipeline` object | |
## [`Pipeline` class](#Pipeline) | |
Manage parsing of a document, which is assumed to be paragraph-sized. | |
--- | |
#### [`__init__` method](#textgraphs.Pipeline.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels) | |
``` | |
Constructor. | |
* `text_input` : `str` | |
raw text to be parsed | |
* `tok_pipe` : `spacy.language.Language` | |
the `spaCy.Language` pipeline used for tallying individual tokens | |
* `ner_pipe` : `spacy.language.Language` | |
the `spaCy.Language` pipeline used for tallying named entities | |
* `aux_pipe` : `spacy.language.Language` | |
the `spaCy.Language` pipeline used for auxiliary components (e.g., `DBPedia Spotlight`) | |
* `kg` : `textgraphs.pipe.KnowledgeGraph` | |
knowledge graph used for entity linking | |
* `infer_rels` : `typing.List[textgraphs.pipe.InferRel]` | |
a list of components for inferring relations | |
--- | |
#### [`get_lemma_key` classmethod](#textgraphs.Pipeline.get_lemma_key) | |
[*\[source\]*]( | |
```python | |
get_lemma_key(span, placeholder=False) | |
``` | |
Compose a unique, invariant lemma key for the given span. | |
* `span` : `typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]` | |
span of tokens within the lemma | |
* `placeholder` : `bool` | |
flag for whether to create a placeholder | |
* *returns* : `str` | |
a composed lemma key | |
--- | |
#### [`get_ent_lemma_keys` method](#textgraphs.Pipeline.get_ent_lemma_keys) | |
[*\[source\]*]( | |
```python | |
get_ent_lemma_keys() | |
``` | |
Iterate through the fully qualified lemma keys for an extracted entity. | |
* *yields* : | |
the lemma keys within an extracted entity | |
--- | |
#### [`link_noun_chunks` method](#textgraphs.Pipeline.link_noun_chunks) | |
[*\[source\]*]( | |
```python | |
link_noun_chunks(nodes, debug=False) | |
``` | |
Link any noun chunks which are not already subsumed by named entities. | |
* `nodes` : `dict` | |
dictionary of `Node` objects in the graph | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.List[textgraphs.elem.NounChunk]` | |
a list of identified noun chunks which are novel | |
--- | |
#### [`iter_entity_pairs` method](#textgraphs.Pipeline.iter_entity_pairs) | |
[*\[source\]*]( | |
```python | |
iter_entity_pairs(pipe_graph, max_skip, debug=True) | |
``` | |
Iterator for entity pairs for which the algorithm infers relations. | |
* `pipe_graph` : `networkx.classes.multigraph.MultiGraph` | |
a `networkx.MultiGraph` representation of the graph, reused for graph algorithms | |
* `max_skip` : `int` | |
maximum distance between entities for inferred relations | |
* `debug` : `bool` | |
debugging flag | |
* *yields* : | |
pairs of entities within a range, e.g., to use for relation extraction | |
## [`Component` class](#Component) | |
Abstract base class for a `spaCy` pipeline component. | |
--- | |
#### [`augment_pipe` method](#textgraphs.Component.augment_pipe) | |
[*\[source\]*]( | |
```python | |
augment_pipe(factory) | |
``` | |
Encapsulate a `spaCy` call to `add_pipe()` configuration. | |
* `factory` : `PipelineFactory` | |
a `PipelineFactory` used to configure components | |
## [`NERSpanMarker` class](#NERSpanMarker) | |
Configures a `spaCy` pipeline component for `SpanMarkerNER` | |
--- | |
#### [`__init__` method](#textgraphs.NERSpanMarker.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(ner_model="tomaarsen/span-marker-roberta-large-ontonotes5") | |
``` | |
Constructor. | |
* `ner_model` : `str` | |
model to be used in `SpanMarker` | |
--- | |
#### [`augment_pipe` method](#textgraphs.NERSpanMarker.augment_pipe) | |
[*\[source\]*]( | |
```python | |
augment_pipe(factory) | |
``` | |
Encapsulate a `spaCy` call to `add_pipe()` configuration. | |
* `factory` : `textgraphs.pipe.PipelineFactory` | |
the `PipelineFactory` used to configure this pipeline component | |
## [`NounChunk` class](#NounChunk) | |
A data class representing one noun chunk, i.e., a candidate as an extracted phrase. | |
--- | |
#### [`__repr__` method](#textgraphs.NounChunk.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
## [`KnowledgeGraph` class](#KnowledgeGraph) | |
Base class for a _knowledge graph_ interface. | |
--- | |
#### [`augment_pipe` method](#textgraphs.KnowledgeGraph.augment_pipe) | |
[*\[source\]*]( | |
```python | |
augment_pipe(factory) | |
``` | |
Encapsulate a `spaCy` call to `add_pipe()` configuration. | |
* `factory` : `PipelineFactory` | |
a `PipelineFactory` used to configure components | |
--- | |
#### [`remap_ner` method](#textgraphs.KnowledgeGraph.remap_ner) | |
[*\[source\]*]( | |
```python | |
remap_ner(label) | |
``` | |
Remap the OntoTypes4 values from NER output to more general-purpose IRIs. | |
* `label` : `typing.Optional[str]` | |
input NER label, an `OntoTypes4` value | |
* *returns* : `typing.Optional[str]` | |
an IRI for the named entity | |
--- | |
#### [`normalize_prefix` method](#textgraphs.KnowledgeGraph.normalize_prefix) | |
[*\[source\]*]( | |
```python | |
normalize_prefix(iri, debug=False) | |
``` | |
Normalize the given IRI to use standard namespace prefixes. | |
* `iri` : `str` | |
input IRI, in fully-qualified domain representation | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `str` | |
the compact IRI representation, using an RDF namespace prefix | |
--- | |
#### [`perform_entity_linking` method](#textgraphs.KnowledgeGraph.perform_entity_linking) | |
[*\[source\]*]( | |
```python | |
perform_entity_linking(graph, pipe, debug=False) | |
``` | |
Perform _entity linking_ based on "spotlight" and other services. | |
* `graph` : `textgraphs.graph.SimpleGraph` | |
source graph | |
* `pipe` : `Pipeline` | |
configured pipeline for the current document | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`resolve_rel_iri` method](#textgraphs.KnowledgeGraph.resolve_rel_iri) | |
[*\[source\]*]( | |
```python | |
resolve_rel_iri(rel, lang="en", debug=False) | |
``` | |
Resolve a `rel` string from a _relation extraction_ model which has | |
been trained on this knowledge graph. | |
* `rel` : `str` | |
relation label, generation these source from Wikidata for many RE projects | |
* `lang` : `str` | |
language identifier | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Optional[str]` | |
a resolved IRI | |
## [`KGSearchHit` class](#KGSearchHit) | |
A data class representing a hit from a _knowledge graph_ search. | |
--- | |
#### [`__repr__` method](#textgraphs.KGSearchHit.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
## [`KGWikiMedia` class](#KGWikiMedia) | |
Manage access to WikiMedia-related APIs. | |
--- | |
#### [`__init__` method](#textgraphs.KGWikiMedia.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(spotlight_api="", dbpedia_search_api="", dbpedia_sparql_api="", wikidata_api="", ner_map=OrderedDict([('CARDINAL', {'iri': '', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': '', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': '', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': '', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': '', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': '', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': '', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': '', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': '', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': '', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': '', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': '', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': '', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': '', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': '', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': '', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': '', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': '', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', ''), ('dbt', ''), ('dbr', ''), ('yago', ''), ('dbd', ''), ('dbo', ''), ('dbp', ''), ('units', ''), ('dbpedia-commons', ''), ('dbpedia-wikicompany', ''), ('dbpedia-wikidata', ''), ('wd', ''), ('wd_ent', ''), ('rdf', ''), ('schema', ''), ('owl', '')]), min_alias=0.8, min_similarity=0.9) | |
``` | |
Constructor. | |
* `spotlight_api` : `str` | |
`DBPedia Spotlight` API or equivalent local service | |
* `dbpedia_search_api` : `str` | |
`DBPedia Search` API or equivalent local service | |
* `dbpedia_sparql_api` : `str` | |
`DBPedia SPARQL` API or equivalent local service | |
* `wikidata_api` : `str` | |
`Wikidata Search` API or equivalent local service | |
* `ner_map` : `dict` | |
named entity map for standardizing IRIs | |
* `ns_prefix` : `dict` | |
RDF namespace prefixes | |
* `min_alias` : `float` | |
minimum alias probability threshold for accepting linked entities | |
* `min_similarity` : `float` | |
minimum label similarity threshold for accepting linked entities | |
--- | |
#### [`augment_pipe` method](#textgraphs.KGWikiMedia.augment_pipe) | |
[*\[source\]*]( | |
```python | |
augment_pipe(factory) | |
``` | |
Encapsulate a `spaCy` call to `add_pipe()` configuration. | |
* `factory` : `textgraphs.pipe.PipelineFactory` | |
a `PipelineFactory` used to configure components | |
--- | |
#### [`remap_ner` method](#textgraphs.KGWikiMedia.remap_ner) | |
[*\[source\]*]( | |
```python | |
remap_ner(label) | |
``` | |
Remap the OntoTypes4 values from NER output to more general-purpose IRIs. | |
* `label` : `typing.Optional[str]` | |
input NER label, an `OntoTypes4` value | |
* *returns* : `typing.Optional[str]` | |
an IRI for the named entity | |
--- | |
#### [`normalize_prefix` method](#textgraphs.KGWikiMedia.normalize_prefix) | |
[*\[source\]*]( | |
```python | |
normalize_prefix(iri, debug=False) | |
``` | |
Normalize the given IRI using the standard DBPedia namespace prefixes. | |
* `iri` : `str` | |
input IRI, in fully-qualified domain representation | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `str` | |
the compact IRI representation, using an RDF namespace prefix | |
--- | |
#### [`perform_entity_linking` method](#textgraphs.KGWikiMedia.perform_entity_linking) | |
[*\[source\]*]( | |
```python | |
perform_entity_linking(graph, pipe, debug=False) | |
``` | |
Perform _entity linking_ based on `DBPedia Spotlight` and other services. | |
* `graph` : `textgraphs.graph.SimpleGraph` | |
source graph | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for the current document | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`resolve_rel_iri` method](#textgraphs.KGWikiMedia.resolve_rel_iri) | |
[*\[source\]*]( | |
```python | |
resolve_rel_iri(rel, lang="en", debug=False) | |
``` | |
Resolve a `rel` string from a _relation extraction_ model which has | |
been trained on this _knowledge graph_, which defaults to using the | |
`WikiMedia` graphs. | |
* `rel` : `str` | |
relation label, generation these source from Wikidata for many RE projects | |
* `lang` : `str` | |
language identifier | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Optional[str]` | |
a resolved IRI | |
--- | |
#### [`wikidata_search` method](#textgraphs.KGWikiMedia.wikidata_search) | |
[*\[source\]*]( | |
```python | |
wikidata_search(query, lang="en", debug=False) | |
``` | |
Query the Wikidata search API. | |
* `query` : `str` | |
query string | |
* `lang` : `str` | |
language identifier | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Optional[textgraphs.elem.KGSearchHit]` | |
search hit, if any | |
--- | |
#### [`dbpedia_search_entity` method](#textgraphs.KGWikiMedia.dbpedia_search_entity) | |
[*\[source\]*]( | |
```python | |
dbpedia_search_entity(query, lang="en", debug=False) | |
``` | |
Perform a DBPedia API search. | |
* `query` : `str` | |
query string | |
* `lang` : `str` | |
language identifier | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Optional[textgraphs.elem.KGSearchHit]` | |
search hit, if any | |
--- | |
#### [`dbpedia_sparql_query` method](#textgraphs.KGWikiMedia.dbpedia_sparql_query) | |
[*\[source\]*]( | |
```python | |
dbpedia_sparql_query(sparql, debug=False) | |
``` | |
Perform a SPARQL query on DBPedia. | |
* `sparql` : `str` | |
SPARQL query string | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `dict` | |
dictionary of query results | |
--- | |
#### [`dbpedia_wikidata_equiv` method](#textgraphs.KGWikiMedia.dbpedia_wikidata_equiv) | |
[*\[source\]*]( | |
```python | |
dbpedia_wikidata_equiv(dbpedia_iri, debug=False) | |
``` | |
Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity. | |
* `dbpedia_iri` : `str` | |
IRI in DBpedia | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Optional[str]` | |
equivalent IRI in Wikidata | |
## [`LinkedEntity` class](#LinkedEntity) | |
A data class representing one linked entity. | |
--- | |
#### [`__repr__` method](#textgraphs.LinkedEntity.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
## [`InferRel` class](#InferRel) | |
Abstract base class for a _relation extraction_ model wrapper. | |
--- | |
#### [`gen_triples_async` method](#textgraphs.InferRel.gen_triples_async) | |
[*\[source\]*]( | |
```python | |
gen_triples_async(pipe, queue, debug=False) | |
``` | |
Infer relations as triples produced to a queue _concurrently_. | |
* `pipe` : `Pipeline` | |
configured pipeline for the current document | |
* `queue` : `asyncio.queues.Queue` | |
queue of inference tasks to be performed | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`gen_triples` method](#textgraphs.InferRel.gen_triples) | |
[*\[source\]*]( | |
```python | |
gen_triples(pipe, debug=False) | |
``` | |
Infer relations as triples through a generator _iteratively_. | |
* `pipe` : `Pipeline` | |
configured pipeline for the current document | |
* `debug` : `bool` | |
debugging flag | |
* *yields* : | |
generated triples | |
## [`InferRel_OpenNRE` class](#InferRel_OpenNRE) | |
Perform relation extraction based on the `OpenNRE` model. | |
<> | |
--- | |
#### [`__init__` method](#textgraphs.InferRel_OpenNRE.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9) | |
``` | |
Constructor. | |
* `model` : `str` | |
the specific model to be used in `OpenNRE` | |
* `max_skip` : `int` | |
maximum distance between entities for inferred relations | |
* `min_prob` : `float` | |
minimum probability threshold for accepting an inferred relation | |
--- | |
#### [`gen_triples` method](#textgraphs.InferRel_OpenNRE.gen_triples) | |
[*\[source\]*]( | |
```python | |
gen_triples(pipe, debug=False) | |
``` | |
Iterate on entity pairs to drive `OpenNRE`, inferring relations | |
represented as triples which get produced by a generator. | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for the current document | |
* `debug` : `bool` | |
debugging flag | |
* *yields* : | |
generated triples as candidates for inferred relations | |
## [`InferRel_Rebel` class](#InferRel_Rebel) | |
Perform relation extraction based on the `REBEL` model. | |
<> | |
<> | |
--- | |
#### [`__init__` method](#textgraphs.InferRel_Rebel.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large") | |
``` | |
Constructor. | |
* `lang` : `str` | |
language identifier | |
* `mrebel_model` : `str` | |
tokenizer model to be used | |
--- | |
#### [`tokenize_sent` method](#textgraphs.InferRel_Rebel.tokenize_sent) | |
[*\[source\]*]( | |
```python | |
tokenize_sent(text) | |
``` | |
Apply the tokenizer manually, since we need to extract special tokens. | |
* `text` : `str` | |
input text for the sentence to be tokenized | |
* *returns* : `str` | |
extracted tokens | |
--- | |
#### [`extract_triplets_typed` method](#textgraphs.InferRel_Rebel.extract_triplets_typed) | |
[*\[source\]*]( | |
```python | |
extract_triplets_typed(text) | |
``` | |
Parse the generated text and extract its triplets. | |
* `text` : `str` | |
input text for the sentence to use in inference | |
* *returns* : `list` | |
a list of extracted triples | |
--- | |
#### [`gen_triples` method](#textgraphs.InferRel_Rebel.gen_triples) | |
[*\[source\]*]( | |
```python | |
gen_triples(pipe, debug=False) | |
``` | |
Drive `REBEL` to infer relations for each sentence, represented as | |
triples which get produced by a generator. | |
* `pipe` : `textgraphs.pipe.Pipeline` | |
configured pipeline for the current document | |
* `debug` : `bool` | |
debugging flag | |
* *yields* : | |
generated triples as candidates for inferred relations | |
## [`RenderPyVis` class](#RenderPyVis) | |
Render the _lemma graph_ as a `PyVis` network. | |
--- | |
#### [`__init__` method](#textgraphs.RenderPyVis.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(graph, kg) | |
``` | |
Constructor. | |
* `graph` : `textgraphs.graph.SimpleGraph` | |
source graph to be visualized | |
* `kg` : `textgraphs.pipe.KnowledgeGraph` | |
knowledge graph used for entity linking | |
--- | |
#### [`render_lemma_graph` method](#textgraphs.RenderPyVis.render_lemma_graph) | |
[*\[source\]*]( | |
```python | |
render_lemma_graph(debug=True) | |
``` | |
Prepare the structure of the `NetworkX` graph to use for building | |
and returning a `PyVis` network to render. | |
Make sure to call beforehand: `TextGraphs.calc_phrase_ranks()` | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `` | |
<a `` interactive visualization | |
--- | |
#### [`draw_communities` method](#textgraphs.RenderPyVis.draw_communities) | |
[*\[source\]*]( | |
```python | |
draw_communities(spring_distance=1.4, debug=False) | |
``` | |
Cluster the communities in the _lemma graph_, then draw a | |
`NetworkX` graph of the notes with a specific color for each | |
community. | |
Make sure to call beforehand: `TextGraphs.calc_phrase_ranks()` | |
* `spring_distance` : `float` | |
`NetworkX` parameter used to separate clusters visually | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Dict[int, int]` | |
a map of the calculated communities | |
--- | |
#### [`generate_wordcloud` method](#textgraphs.RenderPyVis.generate_wordcloud) | |
[*\[source\]*]( | |
```python | |
generate_wordcloud(background="black") | |
``` | |
Generate a tag cloud from the given phrases. | |
Make sure to call beforehand: `TextGraphs.calc_phrase_ranks()` | |
* `background` : `str` | |
background color for the rendering | |
* *returns* : `wordcloud.wordcloud.WordCloud` | |
the rendering as a `wordcloud.WordCloud` object, which can be used to generate PNG images, etc. | |
## [`NodeStyle` class](#NodeStyle) | |
Dataclass used for styling PyVis nodes. | |
--- | |
#### [`__setattr__` method](#textgraphs.NodeStyle.__setattr__) | |
[*\[source\]*](<string>#L2) | |
```python | |
__setattr__(name, value) | |
``` | |
## [`GraphOfRelations` class](#GraphOfRelations) | |
Attempt to reproduce results published in | |
"INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" | |
<> | |
--- | |
#### [`__init__` method](#textgraphs.GraphOfRelations.__init__) | |
[*\[source\]*]( | |
```python | |
__init__(source) | |
``` | |
Constructor. | |
* `source` : `textgraphs.graph.SimpleGraph` | |
source graph to be transformed | |
--- | |
#### [`load_ingram` method](#textgraphs.GraphOfRelations.load_ingram) | |
[*\[source\]*]( | |
```python | |
load_ingram(json_file, debug=False) | |
``` | |
Load data for a source graph, as illustrated in _lee2023ingram_ | |
* `json_file` : `pathlib.Path` | |
path for the JSON dataset to load | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`seeds` method](#textgraphs.GraphOfRelations.seeds) | |
[*\[source\]*]( | |
```python | |
seeds(debug=False) | |
``` | |
Prep data for the topological transform illustrated in _lee2023ingram_ | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`trace_source_graph` method](#textgraphs.GraphOfRelations.trace_source_graph) | |
[*\[source\]*]( | |
```python | |
trace_source_graph() | |
``` | |
Output a "seed" representation of the source graph. | |
--- | |
#### [`construct_gor` method](#textgraphs.GraphOfRelations.construct_gor) | |
[*\[source\]*]( | |
```python | |
construct_gor(debug=False) | |
``` | |
Perform the topological transform described by _lee2023ingram_, | |
constructing a _graph of relations_ (GOR) and calculating | |
_affinity scores_ between entities in the GOR based on their | |
definitions: | |
> we measure the affinity between two relations by considering how many | |
entities are shared between them and how frequently they share the same | |
entity | |
* `debug` : `bool` | |
debugging flag | |
--- | |
#### [`tally_frequencies` classmethod](#textgraphs.GraphOfRelations.tally_frequencies) | |
[*\[source\]*]( | |
```python | |
tally_frequencies(counter) | |
``` | |
Tally the frequency of shared entities. | |
* `counter` : `collections.Counter` | |
`counter` data collection for the rel_b/entity pairs | |
* *returns* : `int` | |
tallied values for one relation | |
--- | |
#### [`get_affinity_scores` method](#textgraphs.GraphOfRelations.get_affinity_scores) | |
[*\[source\]*]( | |
```python | |
get_affinity_scores(debug=False) | |
``` | |
Reproduce metrics based on the example published in _lee2023ingram_ | |
* `debug` : `bool` | |
debugging flag | |
* *returns* : `typing.Dict[tuple, float]` | |
the calculated affinity scores | |
--- | |
#### [`trace_metrics` method](#textgraphs.GraphOfRelations.trace_metrics) | |
[*\[source\]*]( | |
```python | |
trace_metrics(scores) | |
``` | |
Compare the calculated affinity scores with results from a published | |
example. | |
* `scores` : `typing.Dict[tuple, float]` | |
the calculated affinity scores between pairs of relations (i.e., observed values) | |
* *returns* : `pandas.core.frame.DataFrame` | |
a `pandas.DataFrame` where the rows compare expected vs. observed affinity scores | |
--- | |
#### [`render_gor_plt` method](#textgraphs.GraphOfRelations.render_gor_plt) | |
[*\[source\]*]( | |
```python | |
render_gor_plt(scores) | |
``` | |
Visualize the _graph of relations_ using `matplotlib` | |
* `scores` : `typing.Dict[tuple, float]` | |
the calculated affinity scores between pairs of relations (i.e., observed values) | |
--- | |
#### [`render_gor_pyvis` method](#textgraphs.GraphOfRelations.render_gor_pyvis) | |
[*\[source\]*]( | |
```python | |
render_gor_pyvis(scores) | |
``` | |
Visualize the _graph of relations_ interactively using `PyVis` | |
* `scores` : `typing.Dict[tuple, float]` | |
the calculated affinity scores between pairs of relations (i.e., observed values) | |
* *returns* : `` | |
a `pyvis.networkNetwork` representation of the transformed graph | |
## [`TransArc` class](#TransArc) | |
A data class representing one transformed rel-node-rel triple in | |
a _graph of relations_. | |
--- | |
#### [`__repr__` method](#textgraphs.TransArc.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
## [`RelDir` class](#RelDir) | |
Enumeration for the directions of a relation. | |
## [`SheafSeed` class](#SheafSeed) | |
A data class representing a node from the source graph plus its | |
partial edge, based on a _Sheaf Theory_ decomposition of a graph. | |
--- | |
#### [`__repr__` method](#textgraphs.SheafSeed.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
## [`Affinity` class](#Affinity) | |
A data class representing the affinity scores from one entity | |
in the transformed _graph of relations_. | |
NB: there are much more efficient ways to calculate these | |
_affinity scores_ using sparse tensor algebra; this approach | |
illustrates the process -- for research and debugging. | |
--- | |
#### [`__repr__` method](#textgraphs.Affinity.__repr__) | |
[*\[source\]*]( | |
```python | |
__repr__() | |
``` | |
--- | |
## [module functions](#textgraphs) | |
--- | |
#### [`calc_quantile_bins` function](#textgraphs.calc_quantile_bins) | |
[*\[source\]*]( | |
```python | |
calc_quantile_bins(num_rows) | |
``` | |
Calculate the bins to use for a quantile stripe, | |
using [`numpy.linspace`]( | |
* `num_rows` : `int` | |
number of rows in the target dataframe | |
* *returns* : `numpy.ndarray` | |
calculated bins, as a `numpy.ndarray` | |
--- | |
#### [`get_repo_version` function](#textgraphs.get_repo_version) | |
[*\[source\]*]( | |
```python | |
get_repo_version() | |
``` | |
Access the Git repository information and return items to identify | |
the version/commit running in production. | |
* *returns* : `typing.Tuple[str, str]` | |
version tag and commit hash | |
--- | |
#### [`root_mean_square` function](#textgraphs.root_mean_square) | |
[*\[source\]*]( | |
```python | |
root_mean_square(values) | |
``` | |
Calculate the [*root mean square*]( | |
of the values in the given list. | |
* `values` : `typing.List[float]` | |
list of values to use in the RMS calculation | |
* *returns* : `float` | |
RMS metric as a float | |
--- | |
#### [`stripe_column` function](#textgraphs.stripe_column) | |
[*\[source\]*]( | |
```python | |
stripe_column(values, bins) | |
``` | |
Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes. | |
* `values` : `list` | |
list of values to stripe | |
* `bins` : `int` | |
quantile bins; see [`calc_quantile_bins()`](#calc_quantile_bins-function) | |
* *returns* : `numpy.ndarray` | |
the striped column values, as a `numpy.ndarray` | |
--- | |
## [module types](#textgraphs) | |