|
PubTator.key
|
|
|
|
A BioC format for PubTator and other NER tools (i.e., tmChem, DNorm, tmVar, SR4GN or GenNorm) developed at the Biomedical Text Mining group at NCBI
|
|
The goal of this collection is to provide easy access to the text and bio-concept annotations for PMC articles.
|
|
|
|
collection: a group of PubMed documents, each document is organized into title, abstract and other passages
|
|
|
|
source: PubMed, PubMed Central, etc.
|
|
|
|
date: Document download date
|
|
|
|
document: abstract, full-text article, free-text document, etc.
|
|
|
|
id: PubMed ID (or other ID in a given collection) of the document
|
|
|
|
passage: Title, abstract and other passages
|
|
|
|
infon["type"]: "title", "abstract" and other passages
|
|
|
|
offset: Title has an offset of zero, while the other passages (e.g., abstract) are assumed to begin after the previous passages and one space
|
|
|
|
text: Text of the passage
|
|
|
|
annotation: One bio-concept of the passage as determined by the tmChem, DNorm, tmVar, SR4GN or GenNorm
|
|
|
|
infon["type"]: The type of bioconcept, e.g. "Gene", "Species", "Disease", "Chemical" or "Mutation"
|
|
|
|
infon["MeSH"]: The bio-concept identifier in MeSH as detected by DNorm or tmChem
|
|
|
|
infon["OMIM"]: The bio-concept identifier in OMIM as detected by DNorm
|
|
|
|
infon["NCBI_Gene"]: The bio-concept identifier in NCBI Gene as detected by GenNorm
|
|
|
|
infon["NCBI_Taxonomy"]: The bio-concept identifier in NCBI Taxonomy as detected by SR4GN
|
|
|
|
infon["ChEBI"]: The bio-concept identifier in ChEBI as detected by tmChem
|
|
|
|
infon["tmVar"]: The intelligent key generated artificially for the mention detected by tmVar (<Sequence type>|<Mutation type>|<Wild type>|<Mutation position>|<Mutant>)
|
|
|
|
location: location of the mention including the global document "offset" where a bio-concept is located and the "length" of the mention
|
|
|
|
text: Mention of the bio-concept
|
|
|