File size: 1,933 Bytes
69fb171
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
PubTator.key

A BioC format for PubTator and other NER tools (i.e., tmChem, DNorm, tmVar, SR4GN or GenNorm) developed at the Biomedical Text Mining group at NCBI
The goal of this collection is to provide easy access to the text and bio-concept annotations for PMC articles. 

	collection:  a group of PubMed documents, each document is organized into title, abstract and other passages 

	source:  PubMed, PubMed Central, etc. 

	date:  Document download date

	document:  abstract, full-text article, free-text document, etc.
	
	id:  PubMed ID (or other ID in a given collection) of the document 

	passage:  Title, abstract and other passages 

		infon["type"]:  "title", "abstract" and other passages

		offset: Title has an offset of zero, while the other passages (e.g., abstract) are assumed to begin after the previous passages and one space
		
		text: Text of the passage 

		annotation:  One bio-concept of the passage as determined by the tmChem, DNorm, tmVar, SR4GN or GenNorm
				
			infon["type"]:  The type of bioconcept, e.g. "Gene", "Species", "Disease", "Chemical" or "Mutation"		
	
			infon["MeSH"]:  The bio-concept identifier in MeSH as detected by DNorm or tmChem
			
			infon["OMIM"]:  The bio-concept identifier in OMIM as detected by DNorm
			
			infon["NCBI_Gene"]:  The bio-concept identifier in NCBI Gene as detected by GenNorm
			
			infon["NCBI_Taxonomy"]:  The bio-concept identifier in NCBI Taxonomy as detected by SR4GN
			
			infon["ChEBI"]:  The bio-concept identifier in ChEBI as detected by tmChem
			
			infon["tmVar"]:  The intelligent key generated artificially for the mention detected by tmVar (<Sequence type>|<Mutation type>|<Wild type>|<Mutation position>|<Mutant>)
			
			location: location of the mention including the global document "offset" where a bio-concept is located and the "length" of the mention 

			text: Mention of the bio-concept