Reference: textgraphs package

see copyright/license https://huggingface.co/spaces/DerwenAI/textgraphs/blob/main/README.md

TextGraphs class

Construct a lemma graph from the unstructured text source, then extract ranked phrases using a textgraph algorithm.

infer_relations_async method


infer_relations_async(pipe, debug=False)

Gather triples representing inferred relations and build edges, concurrently by running an async queue. https://stackoverflow.com/questions/52582685/using-asyncio-queue-for-producer-consumer-flow

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • debug : bool
    debugging flag

  • returns : typing.List[textgraphs.elem.Edge]
    a list of the inferred Edge objects

__init__ method


__init__(factory=None, iri_base="https://github.com/DerwenAI/textgraphs/ns/")


  • factory : typing.Optional[textgraphs.pipe.PipelineFactory]
    optional PipelineFactory used to configure components

create_pipeline method



Use the pipeline factory to create a pipeline (e.g., spaCy.Document) for each text input, which are typically paragraph-length.

  • text_input : str
    raw text to be parsed by this pipeline

  • returns : textgraphs.pipe.Pipeline
    a configured pipeline

create_render method



Create an object for rendering the graph in PyVis HTML+JavaScript.

  • returns : textgraphs.vis.RenderPyVis
    a configured RenderPyVis object for generating graph visualizations

collect_graph_elements method


collect_graph_elements(pipe, text_id=0, para_id=0, debug=False)

Collect the elements of a lemma graph from the results of running the textgraph algorithm. These elements include: parse dependencies, lemmas, entities, and noun chunks.

Make sure to call beforehand: TextGraphs.create_pipeline()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • text_id : int
    text (top-level document) identifier

  • para_id : int
    paragraph identitifer

  • debug : bool
    debugging flag

construct_lemma_graph method



Construct the base level of the lemma graph from the collected elements. This gets represented in NetworkX as a directed graph with parallel edges.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • debug : bool
    debugging flag

perform_entity_linking method


perform_entity_linking(pipe, debug=False)

Perform entity linking based on the KnowledgeGraph object.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • debug : bool
    debugging flag

infer_relations method


infer_relations(pipe, debug=False)

Gather triples representing inferred relations and build edges.

Make sure to call beforehand: TextGraphs.collect_graph_elements()

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for this document

  • debug : bool
    debugging flag

  • returns : typing.List[textgraphs.elem.Edge]
    a list of the inferred Edge objects

calc_phrase_ranks method


calc_phrase_ranks(pr_alpha=0.85, debug=False)

Calculate the weights for each node in the lemma graph, then stack-rank the nodes so that entities have priority over lemmas.

Phrase ranks are normalized to sum to 1.0 and these now represent the ranked entities extracted from the document.

Make sure to call beforehand: TextGraphs.construct_lemma_graph()

  • pr_alpha : float
    optional alpha parameter for the PageRank algorithm

  • debug : bool
    debugging flag

get_phrases method



Return the entities extracted from the document.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • yields :
    extracted entities

get_phrases_as_df method



Return the ranked extracted entities as a dataframe.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • returns : pandas.core.frame.DataFrame
    a pandas.DataFrame of the extracted entities

export_rdf method



Extract the entities and relations which have IRIs as RDF triples.

  • lang : str
    language identifier

  • returns : str
    RDF triples N3 (Turtle) format as a string

denormalize_iri method



Discern between a parsed entity and a linked entity.

  • returns : str
    lemma_key for a parsed entity, the full IRI for a linked entity

load_bootstrap_ttl method


load_bootstrap_ttl(ttl_str, debug=False)

Parse a TTL string with an RDF semantic graph representation to load bootstrap definitions for the lemma graph prior to parsing, e.g., for synonyms.

  • ttl_str : str
    RDF triples in TTL (Turtle/N3) format

  • debug : bool
    debugging flag

export_kuzu method


export_kuzu(zip_name="lemma.zip", debug=False)

Export a labeled property graph for KùzuDB (openCypher).

  • debug : bool
    debugging flag

  • returns : str
    name of the generated ZIP file

SimpleGraph class

An in-memory graph used to build a MultiDiGraph in NetworkX.

__init__ method




reset method



Re-initialize the data structures, resetting all but the configuration.

make_node method


make_node(tokens, key, span, kind, text_id, para_id, sent_id, label=None, length=1, linked=True)

Lookup and return a Node object. By default, link matching keys into the same node. Otherwise instantiate a new node if it does not exist already.

  • tokens : typing.List[textgraphs.elem.Node]
    list of parsed tokens

  • key : str
    lemma key (invariant)

  • span : spacy.tokens.token.Token
    token span for the parsed entity

  • kind : <enum 'NodeEnum'>
    the kind of this Node object

  • text_id : int
    text (top-level document) identifier

  • para_id : int
    paragraph identitifer

  • sent_id : int
    sentence identifier

  • label : typing.Optional[str]
    node label (for a new object)

  • length : int
    length of token span

  • linked : bool
    flag for whether this links to an entity

  • returns : textgraphs.elem.Node
    the constructed Node object

make_edge method


make_edge(src_node, dst_node, kind, rel, prob, key=None, debug=False)

Lookup an edge, creating a new one if it does not exist already, and increment the count if it does.

  • src_node : textgraphs.elem.Node
    source node in the triple

  • dst_node : textgraphs.elem.Node
    destination node in the triple

  • kind : <enum 'RelEnum'>
    the kind of this Edge object

  • rel : str
    relation label

  • prob : float
    probability of this Edge within the graph

  • key : typing.Optional[str]
    lemma key (invariant); generate a key if this is not provided

  • debug : bool
    debugging flag

  • returns : typing.Optional[textgraphs.elem.Edge]
    the constructed Edge object; this may be None if the input parameters indicate skipping the edge

dump_lemma_graph method



Dump the lemma graph as a JSON string in node-link format, suitable for serialization and subsequent use in JavaScript, Neo4j, Graphistry, etc.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • returns : str
    a JSON representation of the exported lemma graph in

load_lemma_graph method


load_lemma_graph(json_str, debug=False)

Load from a JSON string in a JSON representation of the exported lemma graph in node-link format

  • debug : bool
    debugging flag

Node class

A data class representing one node, i.e., an extracted phrase.

__repr__ method



get_linked_label method



When this node has a linked entity, return that IRI. Otherwise return its label value.

  • returns : typing.Optional[str]
    a label for the linked entity

get_name method



Return a brief name for the graphical depiction of this Node.

  • returns : str
    brief label to be used in a graph

get_stacked_count method



Return a modified count, to redact verbs and linked entities from the stack-rank partitions.

  • returns : int
    count, used for re-ranking extracted entities

get_pos method



Generate a position span for OpenNRE.

  • returns : typing.Tuple[int, int]
    a position span needed for OpenNRE relation extraction

Edge class

A data class representing an edge between two nodes.

__repr__ method



EnumBase class

A mixin for Enum codecs.

NodeEnum class

Enumeration for the kinds of node categories

RelEnum class

Enumeration for the kinds of edge relations

PipelineFactory class

Factory pattern for building a pipeline, which is one of the more expensive operations with spaCy

__init__ method


__init__(spacy_model="en_core_web_sm", ner=None, kg=<textgraphs.pipe.KnowledgeGraph object at 0x130529960>, infer_rels=[])

Constructor which instantiates the spaCy pipelines:

  • tok_pipe -- regular generator for parsed tokens
  • ner_pipe -- with entities merged
  • aux_pipe -- spotlight entity linking

which will be needed for parsing and entity linking.

  • spacy_model : str
    the specific model to use in spaCy pipelines

  • ner : typing.Optional[textgraphs.pipe.Component]
    optional custom NER component

  • kg : textgraphs.pipe.KnowledgeGraph
    knowledge graph used for entity linking

  • infer_rels : typing.List[textgraphs.pipe.InferRel]
    a list of components for inferring relations

create_pipeline method



Instantiate the document pipelines needed to parse the input text.

  • text_input : str
    raw text to be parsed

  • returns : textgraphs.pipe.Pipeline
    a configured Pipeline object

Pipeline class

Manage parsing of a document, which is assumed to be paragraph-sized.

__init__ method


__init__(text_input, tok_pipe, ner_pipe, aux_pipe, kg, infer_rels)


  • text_input : str
    raw text to be parsed

  • tok_pipe : spacy.language.Language
    the spaCy.Language pipeline used for tallying individual tokens

  • ner_pipe : spacy.language.Language
    the spaCy.Language pipeline used for tallying named entities

  • aux_pipe : spacy.language.Language
    the spaCy.Language pipeline used for auxiliary components (e.g., DBPedia Spotlight)

  • kg : textgraphs.pipe.KnowledgeGraph
    knowledge graph used for entity linking

  • infer_rels : typing.List[textgraphs.pipe.InferRel]
    a list of components for inferring relations

get_lemma_key classmethod


get_lemma_key(span, placeholder=False)

Compose a unique, invariant lemma key for the given span.

  • span : typing.Union[spacy.tokens.span.Span, spacy.tokens.token.Token]
    span of tokens within the lemma

  • placeholder : bool
    flag for whether to create a placeholder

  • returns : str
    a composed lemma key

get_ent_lemma_keys method



Iterate through the fully qualified lemma keys for an extracted entity.

  • yields :
    the lemma keys within an extracted entity

link_noun_chunks method


link_noun_chunks(nodes, debug=False)

Link any noun chunks which are not already subsumed by named entities.

  • nodes : dict
    dictionary of Node objects in the graph

  • debug : bool
    debugging flag

  • returns : typing.List[textgraphs.elem.NounChunk]
    a list of identified noun chunks which are novel

iter_entity_pairs method


iter_entity_pairs(pipe_graph, max_skip, debug=True)

Iterator for entity pairs for which the algorithm infers relations.

  • pipe_graph : networkx.classes.multigraph.MultiGraph
    a networkx.MultiGraph representation of the graph, reused for graph algorithms

  • max_skip : int
    maximum distance between entities for inferred relations

  • debug : bool
    debugging flag

  • yields :
    pairs of entities within a range, e.g., to use for relation extraction

Component class

Abstract base class for a spaCy pipeline component.

augment_pipe method



Encapsulate a spaCy call to add_pipe() configuration.

  • factory : PipelineFactory
    a PipelineFactory used to configure components

NERSpanMarker class

Configures a spaCy pipeline component for SpanMarkerNER

__init__ method




  • ner_model : str
    model to be used in SpanMarker

augment_pipe method



Encapsulate a spaCy call to add_pipe() configuration.

  • factory : textgraphs.pipe.PipelineFactory
    the PipelineFactory used to configure this pipeline component

NounChunk class

A data class representing one noun chunk, i.e., a candidate as an extracted phrase.

__repr__ method



KnowledgeGraph class

Base class for a knowledge graph interface.

augment_pipe method



Encapsulate a spaCy call to add_pipe() configuration.

  • factory : PipelineFactory
    a PipelineFactory used to configure components

remap_ner method



Remap the OntoTypes4 values from NER output to more general-purpose IRIs.

  • label : typing.Optional[str]
    input NER label, an OntoTypes4 value

  • returns : typing.Optional[str]
    an IRI for the named entity

normalize_prefix method


normalize_prefix(iri, debug=False)

Normalize the given IRI to use standard namespace prefixes.

  • iri : str
    input IRI, in fully-qualified domain representation

  • debug : bool
    debugging flag

  • returns : str
    the compact IRI representation, using an RDF namespace prefix

perform_entity_linking method


perform_entity_linking(graph, pipe, debug=False)

Perform entity linking based on "spotlight" and other services.

  • graph : textgraphs.graph.SimpleGraph
    source graph

  • pipe : Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

resolve_rel_iri method


resolve_rel_iri(rel, lang="en", debug=False)

Resolve a rel string from a relation extraction model which has been trained on this knowledge graph.

  • rel : str
    relation label, generation these source from Wikidata for many RE projects

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[str]
    a resolved IRI

KGSearchHit class

A data class representing a hit from a knowledge graph search.

__repr__ method



KGWikiMedia class

Manage access to WikiMedia-related APIs.

__init__ method


__init__(spotlight_api="https://api.dbpedia-spotlight.org/en", dbpedia_search_api="https://lookup.dbpedia.org/api/search", dbpedia_sparql_api="https://dbpedia.org/sparql", wikidata_api="https://www.wikidata.org/w/api.php", ner_map=OrderedDict([('CARDINAL', {'iri': 'http://dbpedia.org/resource/Cardinal_number', 'definition': 'Numerals that do not fall under another type', 'label': 'cardinal number'}), ('DATE', {'iri': 'http://dbpedia.org/ontology/date', 'definition': 'Absolute or relative dates or periods', 'label': 'date'}), ('EVENT', {'iri': 'http://dbpedia.org/ontology/Event', 'definition': 'Named hurricanes, battles, wars, sports events, etc.', 'label': 'event'}), ('FAC', {'iri': 'http://dbpedia.org/ontology/Infrastructure', 'definition': 'Buildings, airports, highways, bridges, etc.', 'label': 'infrastructure'}), ('GPE', {'iri': 'http://dbpedia.org/ontology/Country', 'definition': 'Countries, cities, states', 'label': 'country'}), ('LANGUAGE', {'iri': 'http://dbpedia.org/ontology/Language', 'definition': 'Any named language', 'label': 'language'}), ('LAW', {'iri': 'http://dbpedia.org/ontology/Law', 'definition': 'Named documents made into laws', 'label': 'law'}), ('LOC', {'iri': 'http://dbpedia.org/ontology/Place', 'definition': 'Non-GPE locations, mountain ranges, bodies of water', 'label': 'place'}), ('MONEY', {'iri': 'http://dbpedia.org/resource/Money', 'definition': 'Monetary values, including unit', 'label': 'money'}), ('NORP', {'iri': 'http://dbpedia.org/ontology/nationality', 'definition': 'Nationalities or religious or political groups', 'label': 'nationality'}), ('ORDINAL', {'iri': 'http://dbpedia.org/resource/Ordinal_number', 'definition': 'Ordinal number, i.e., first, second, etc.', 'label': 'ordinal number'}), ('ORG', {'iri': 'http://dbpedia.org/ontology/Organisation', 'definition': 'Companies, agencies, institutions, etc.', 'label': 'organization'}), ('PERCENT', {'iri': 'http://dbpedia.org/resource/Percentage', 'definition': 'Percentage', 'label': 'percentage'}), ('PERSON', {'iri': 'http://dbpedia.org/ontology/Person', 'definition': 'People, including fictional', 'label': 'person'}), ('PRODUCT', {'iri': 'http://dbpedia.org/ontology/product', 'definition': 'Vehicles, weapons, foods, etc. (Not services)', 'label': 'product'}), ('QUANTITY', {'iri': 'http://dbpedia.org/resource/Quantity', 'definition': 'Measurements, as of weight or distance', 'label': 'quantity'}), ('TIME', {'iri': 'http://dbpedia.org/ontology/time', 'definition': 'Times smaller than a day', 'label': 'time'}), ('WORK OF ART', {'iri': 'http://dbpedia.org/resource/Work_of_art', 'definition': 'Titles of books, songs, etc.', 'label': 'work of art'})]), ns_prefix=OrderedDict([('dbc', 'http://dbpedia.org/resource/Category:'), ('dbt', 'http://dbpedia.org/resource/Template:'), ('dbr', 'http://dbpedia.org/resource/'), ('yago', 'http://dbpedia.org/class/yago/'), ('dbd', 'http://dbpedia.org/datatype/'), ('dbo', 'http://dbpedia.org/ontology/'), ('dbp', 'http://dbpedia.org/property/'), ('units', 'http://dbpedia.org/units/'), ('dbpedia-commons', 'http://commons.dbpedia.org/resource/'), ('dbpedia-wikicompany', 'http://dbpedia.openlinksw.com/wikicompany/'), ('dbpedia-wikidata', 'http://wikidata.dbpedia.org/resource/'), ('wd', 'http://www.wikidata.org/'), ('wd_ent', 'http://www.wikidata.org/entity/'), ('rdf', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'), ('schema', 'https://schema.org/'), ('owl', 'http://www.w3.org/2002/07/owl#')]), min_alias=0.8, min_similarity=0.9)


  • spotlight_api : str
    DBPedia Spotlight API or equivalent local service

  • dbpedia_search_api : str
    DBPedia Search API or equivalent local service

  • dbpedia_sparql_api : str
    DBPedia SPARQL API or equivalent local service

  • wikidata_api : str
    Wikidata Search API or equivalent local service

  • ner_map : dict
    named entity map for standardizing IRIs

  • ns_prefix : dict
    RDF namespace prefixes

  • min_alias : float
    minimum alias probability threshold for accepting linked entities

  • min_similarity : float
    minimum label similarity threshold for accepting linked entities

augment_pipe method



Encapsulate a spaCy call to add_pipe() configuration.

  • factory : textgraphs.pipe.PipelineFactory
    a PipelineFactory used to configure components

remap_ner method



Remap the OntoTypes4 values from NER output to more general-purpose IRIs.

  • label : typing.Optional[str]
    input NER label, an OntoTypes4 value

  • returns : typing.Optional[str]
    an IRI for the named entity

normalize_prefix method


normalize_prefix(iri, debug=False)

Normalize the given IRI using the standard DBPedia namespace prefixes.

  • iri : str
    input IRI, in fully-qualified domain representation

  • debug : bool
    debugging flag

  • returns : str
    the compact IRI representation, using an RDF namespace prefix

perform_entity_linking method


perform_entity_linking(graph, pipe, debug=False)

Perform entity linking based on DBPedia Spotlight and other services.

  • graph : textgraphs.graph.SimpleGraph
    source graph

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

resolve_rel_iri method


resolve_rel_iri(rel, lang="en", debug=False)

Resolve a rel string from a relation extraction model which has been trained on this knowledge graph, which defaults to using the WikiMedia graphs.

  • rel : str
    relation label, generation these source from Wikidata for many RE projects

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[str]
    a resolved IRI

wikidata_search method


wikidata_search(query, lang="en", debug=False)

Query the Wikidata search API.

  • query : str
    query string

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[textgraphs.elem.KGSearchHit]
    search hit, if any

dbpedia_search_entity method


dbpedia_search_entity(query, lang="en", debug=False)

Perform a DBPedia API search.

  • query : str
    query string

  • lang : str
    language identifier

  • debug : bool
    debugging flag

  • returns : typing.Optional[textgraphs.elem.KGSearchHit]
    search hit, if any

dbpedia_sparql_query method


dbpedia_sparql_query(sparql, debug=False)

Perform a SPARQL query on DBPedia.

  • sparql : str
    SPARQL query string

  • debug : bool
    debugging flag

  • returns : dict
    dictionary of query results

dbpedia_wikidata_equiv method


dbpedia_wikidata_equiv(dbpedia_iri, debug=False)

Perform a SPARQL query on DBPedia to find an equivalent Wikidata entity.

  • dbpedia_iri : str
    IRI in DBpedia

  • debug : bool
    debugging flag

  • returns : typing.Optional[str]
    equivalent IRI in Wikidata

LinkedEntity class

A data class representing one linked entity.

__repr__ method



InferRel class

Abstract base class for a relation extraction model wrapper.

gen_triples_async method


gen_triples_async(pipe, queue, debug=False)

Infer relations as triples produced to a queue concurrently.

  • pipe : Pipeline
    configured pipeline for the current document

  • queue : asyncio.queues.Queue
    queue of inference tasks to be performed

  • debug : bool
    debugging flag

gen_triples method


gen_triples(pipe, debug=False)

Infer relations as triples through a generator iteratively.

  • pipe : Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

  • yields :
    generated triples

InferRel_OpenNRE class

Perform relation extraction based on the OpenNRE model. https://github.com/thunlp/OpenNRE

__init__ method


__init__(model="wiki80_cnn_softmax", max_skip=11, min_prob=0.9)


  • model : str
    the specific model to be used in OpenNRE

  • max_skip : int
    maximum distance between entities for inferred relations

  • min_prob : float
    minimum probability threshold for accepting an inferred relation

gen_triples method


gen_triples(pipe, debug=False)

Iterate on entity pairs to drive OpenNRE, inferring relations represented as triples which get produced by a generator.

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

  • yields :
    generated triples as candidates for inferred relations

InferRel_Rebel class

Perform relation extraction based on the REBEL model. https://github.com/Babelscape/rebel https://huggingface.co/spaces/Babelscape/mrebel-demo

__init__ method


__init__(lang="en_XX", mrebel_model="Babelscape/mrebel-large")


  • lang : str
    language identifier

  • mrebel_model : str
    tokenizer model to be used

tokenize_sent method



Apply the tokenizer manually, since we need to extract special tokens.

  • text : str
    input text for the sentence to be tokenized

  • returns : str
    extracted tokens

extract_triplets_typed method



Parse the generated text and extract its triplets.

  • text : str
    input text for the sentence to use in inference

  • returns : list
    a list of extracted triples

gen_triples method


gen_triples(pipe, debug=False)

Drive REBEL to infer relations for each sentence, represented as triples which get produced by a generator.

  • pipe : textgraphs.pipe.Pipeline
    configured pipeline for the current document

  • debug : bool
    debugging flag

  • yields :
    generated triples as candidates for inferred relations

RenderPyVis class

Render the lemma graph as a PyVis network.

__init__ method


__init__(graph, kg)


  • graph : textgraphs.graph.SimpleGraph
    source graph to be visualized

  • kg : textgraphs.pipe.KnowledgeGraph
    knowledge graph used for entity linking

render_lemma_graph method



Prepare the structure of the NetworkX graph to use for building and returning a PyVis network to render.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • debug : bool
    debugging flag

  • returns : pyvis.network.Network
    <a pyvis.network.Network interactive visualization

draw_communities method


draw_communities(spring_distance=1.4, debug=False)

Cluster the communities in the lemma graph, then draw a NetworkX graph of the notes with a specific color for each community.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • spring_distance : float
    NetworkX parameter used to separate clusters visually

  • debug : bool
    debugging flag

  • returns : typing.Dict[int, int]
    a map of the calculated communities

generate_wordcloud method



Generate a tag cloud from the given phrases.

Make sure to call beforehand: TextGraphs.calc_phrase_ranks()

  • background : str
    background color for the rendering

  • returns : wordcloud.wordcloud.WordCloud
    the rendering as a wordcloud.WordCloud object, which can be used to generate PNG images, etc.

NodeStyle class

Dataclass used for styling PyVis nodes.

__setattr__ method


__setattr__(name, value)

GraphOfRelations class

Attempt to reproduce results published in "INGRAM: Inductive Knowledge Graph Embedding via Relation Graphs" https://arxiv.org/abs/2305.19987

__init__ method




  • source : textgraphs.graph.SimpleGraph
    source graph to be transformed

load_ingram method


load_ingram(json_file, debug=False)

Load data for a source graph, as illustrated in lee2023ingram

  • json_file : pathlib.Path
    path for the JSON dataset to load

  • debug : bool
    debugging flag

seeds method



Prep data for the topological transform illustrated in lee2023ingram

  • debug : bool
    debugging flag

trace_source_graph method



Output a "seed" representation of the source graph.

construct_gor method



Perform the topological transform described by lee2023ingram, constructing a graph of relations (GOR) and calculating affinity scores between entities in the GOR based on their definitions:

we measure the affinity between two relations by considering how many entities are shared between them and how frequently they share the same entity

  • debug : bool
    debugging flag

tally_frequencies classmethod



Tally the frequency of shared entities.

  • counter : collections.Counter
    counter data collection for the rel_b/entity pairs

  • returns : int
    tallied values for one relation

get_affinity_scores method



Reproduce metrics based on the example published in lee2023ingram

  • debug : bool
    debugging flag

  • returns : typing.Dict[tuple, float]
    the calculated affinity scores

trace_metrics method



Compare the calculated affinity scores with results from a published example.

  • scores : typing.Dict[tuple, float]
    the calculated affinity scores between pairs of relations (i.e., observed values)

  • returns : pandas.core.frame.DataFrame
    a pandas.DataFrame where the rows compare expected vs. observed affinity scores

render_gor_plt method



Visualize the graph of relations using matplotlib

  • scores : typing.Dict[tuple, float]
    the calculated affinity scores between pairs of relations (i.e., observed values)

render_gor_pyvis method



Visualize the graph of relations interactively using PyVis

  • scores : typing.Dict[tuple, float]
    the calculated affinity scores between pairs of relations (i.e., observed values)

  • returns : pyvis.network.Network
    a pyvis.networkNetwork representation of the transformed graph

TransArc class

A data class representing one transformed rel-node-rel triple in a graph of relations.

__repr__ method



RelDir class

Enumeration for the directions of a relation.

SheafSeed class

A data class representing a node from the source graph plus its partial edge, based on a Sheaf Theory decomposition of a graph.

__repr__ method



Affinity class

A data class representing the affinity scores from one entity in the transformed graph of relations.

NB: there are much more efficient ways to calculate these affinity scores using sparse tensor algebra; this approach illustrates the process -- for research and debugging.

__repr__ method



module functions

calc_quantile_bins function



Calculate the bins to use for a quantile stripe, using numpy.linspace

  • num_rows : int
    number of rows in the target dataframe

  • returns : numpy.ndarray
    calculated bins, as a numpy.ndarray

get_repo_version function



Access the Git repository information and return items to identify the version/commit running in production.

  • returns : typing.Tuple[str, str]
    version tag and commit hash

root_mean_square function



Calculate the root mean square of the values in the given list.

  • values : typing.List[float]
    list of values to use in the RMS calculation

  • returns : float
    RMS metric as a float

stripe_column function


stripe_column(values, bins)

Stripe a column in a dataframe, by interpolating quantiles into a set of discrete indexes.

  • values : list
    list of values to stripe

  • bins : int
    quantile bins; see calc_quantile_bins()

  • returns : numpy.ndarray
    the striped column values, as a numpy.ndarray

module types