PubTator.key A BioC format for PubTator and other NER tools (i.e., tmChem, DNorm, tmVar, SR4GN or GenNorm) developed at the Biomedical Text Mining group at NCBI The goal of this collection is to provide easy access to the text and bio-concept annotations for PMC articles. collection: a group of PubMed documents, each document is organized into title, abstract and other passages source: PubMed, PubMed Central, etc. date: Document download date document: abstract, full-text article, free-text document, etc. id: PubMed ID (or other ID in a given collection) of the document passage: Title, abstract and other passages infon["type"]: "title", "abstract" and other passages offset: Title has an offset of zero, while the other passages (e.g., abstract) are assumed to begin after the previous passages and one space text: Text of the passage annotation: One bio-concept of the passage as determined by the tmChem, DNorm, tmVar, SR4GN or GenNorm infon["type"]: The type of bioconcept, e.g. "Gene", "Species", "Disease", "Chemical" or "Mutation" infon["MeSH"]: The bio-concept identifier in MeSH as detected by DNorm or tmChem infon["OMIM"]: The bio-concept identifier in OMIM as detected by DNorm infon["NCBI_Gene"]: The bio-concept identifier in NCBI Gene as detected by GenNorm infon["NCBI_Taxonomy"]: The bio-concept identifier in NCBI Taxonomy as detected by SR4GN infon["ChEBI"]: The bio-concept identifier in ChEBI as detected by tmChem infon["tmVar"]: The intelligent key generated artificially for the mention detected by tmVar (||||) location: location of the mention including the global document "offset" where a bio-concept is located and the "length" of the mention text: Mention of the bio-concept