ncats
/

EpiExtract4GARD-v2

@@ -50,7 +50,7 @@ license: other
 ## DOCUMENTATION UPDATES IN PROGRESS
 ## Model description
-**EpiExtract4GARD** is a fine-tuned [BioBERT-base-cased](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) model that is ready to use for **Named Entity Recognition** of locations (LOC), epidemiologic types (EPI), and epidemiologic rates (STAT). This model was fine-tuned on [EpiSet4NER](https://huggingface.co/datasets/ncats/EpiSet4NER) for epidemiological information from rare disease abstracts. See dataset documentation for details on the weakly supervised teaching methods and dataset biases and limitations. See [EpiExtract4GARD on GitHub](https://github.com/ncats/epi4GARD/tree/master/EpiExtract4GARD#epiextract4gard) for details on the entire pipeline.
 #### How to use
 You can use this model with the Hosted inference API to the right with this [test sentence](https://pubmed.ncbi.nlm.nih.gov/21659675/): "27 patients have been diagnosed with PKU in Iceland since 1947. Incidence 1972-2008 is 1/8400 living births."
@@ -116,32 +116,37 @@ B-EPI    | Beginning of an epidemiologic type (e.g. "incidence", "prevalence", "
 I-EPI    | Epidemiologic type that is not the beginning token.
 B-STAT   | Beginning of an epidemiologic rate
 I-STAT   | Inside of an epidemiologic rate
 ### EpiSet Statistics
-Beyond any limitations due to the EpiSet4NER dataset, this model is limited in numeracy due to BERT-based model's use of subword embeddings, which is crucial for epidemiologic rate identification and limits the entity-level results. Additionally, more recent weakly supervised learning techniques could be used to improve the performance of the model without improving the underlying dataset.
 ## Training procedure
 This model was trained on a [AWS EC2 p3.2xlarge](https://aws.amazon.com/ec2/instance-types/), which utilized a single Tesla V100 GPU, with these hyperparameters:
-4 epochs of training (AdamW weight decay = 0.05) with a batch size of 16. Maximum sequence length = 192. Model was fed one sentence at a time. Full config [here](https://wandb.ai/wzkariampuzha/huggingface/runs/353prhts/files/config.yaml).
-## Hold-out validation  results
-metric| entity-level result
--|-
-f1 | 83.8
-precision | 83.2
-recall | 84.5
-## Test results
-| Dataset for Model Training | Evaluation Level |       Entity       | Precision | Recall |   F1  |
-|:--------------------------:|:----------------:|:------------------:|:---------:|:------:|:-----:|
-|           EpiSet           |   Entity-Level   |      Overall       |   0.556   |  0.662 | 0.605 |
-|                            |                  |      Location      |   0.661   |  0.696 | 0.678 |
-|                            |                  | Epidemiologic Type |   0.854   |  0.911 | 0.882 |
-|                            |                  | Epidemiologic Rate |   0.143   |  0.218 | 0.173 |
-|                            |    Token-Level   |      Overall       |   0.811   |  0.713 | 0.759 |
-|                            |                  |      Location      |   0.949   |  0.742 | 0.833 |
-|                            |                  | Epidemiologic Type |    0.9    |  0.917 | 0.908 |
-|                            |                  | Epidemiologic Rate |   0.724   |  0.636 | 0.677 |
 Thanks to [@William Kariampuzha](https://github.com/wzkariampuzha) at Axle Informatics/NCATS for contributing this model.

 ## DOCUMENTATION UPDATES IN PROGRESS
 ## Model description
+**EpiExtract4GARD-v2** is a fine-tuned [BioBERT-base-cased](https://huggingface.co/dmis-lab/biobert-base-cased-v1.1) model that is ready to use for **Named Entity Recognition** of locations (LOC), epidemiologic types (EPI), and epidemiologic rates (STAT). This model was fine-tuned on EpiSet4NER-v2 for epidemiological information from rare disease abstracts. See dataset documentation for details on the weakly supervised teaching methods and dataset biases and limitations. See [EpiExtract4GARD on GitHub](https://github.com/ncats/epi4GARD/tree/master/EpiExtract4GARD#epiextract4gard) for details on the entire pipeline.
 #### How to use
 You can use this model with the Hosted inference API to the right with this [test sentence](https://pubmed.ncbi.nlm.nih.gov/21659675/): "27 patients have been diagnosed with PKU in Iceland since 1947. Incidence 1972-2008 is 1/8400 living births."
 I-EPI    | Epidemiologic type that is not the beginning token.
 B-STAT   | Beginning of an epidemiologic rate
 I-STAT   | Inside of an epidemiologic rate
++More    | Description pending
 ### EpiSet Statistics
+Beyond any limitations due to the EpiSet4NER dataset, this model is limited in numeracy due to BERT-based model's use of subword embeddings, which is crucial for epidemiologic rate identification and limits the entity-level results. Recent techniques in numeracy could be used to improve the performance of the model without improving the underlying dataset.
 ## Training procedure
 This model was trained on a [AWS EC2 p3.2xlarge](https://aws.amazon.com/ec2/instance-types/), which utilized a single Tesla V100 GPU, with these hyperparameters:
+4 epochs of training (AdamW weight decay = 0.05) with a batch size of 16. Maximum sequence length = 192. Model was fed one sentence at a time.
+<!--- Full config [here](https://wandb.ai/wzkariampuzha/huggingface/runs/353prhts/files/config.yaml). --->
+<!--- THIS IS NOT THE UPDATED RESULTS --->
+<!--- ## Hold-out validation  results --->
+<!--- metric| entity-level result --->
+<!--- -|- --->
+<!--- f1 | 83.8 --->
+<!--- precision | 83.2 --->
+<!--- recall | 84.5 --->
+<!--- ## Test results --->
+<!--- | Dataset for Model Training | Evaluation Level |       Entity       | Precision | Recall |   F1  | --->
+<!--- |:--------------------------:|:----------------:|:------------------:|:---------:|:------:|:-----:| --->
+<!--- |           EpiSet           |   Entity-Level   |      Overall       |   0.556   |  0.662 | 0.605 | --->
+<!--- |                            |                  |      Location      |   0.661   |  0.696 | 0.678 | --->
+<!--- |                            |                  | Epidemiologic Type |   0.854   |  0.911 | 0.882 | --->
+<!--- |                            |                  | Epidemiologic Rate |   0.143   |  0.218 | 0.173 | --->
+<!--- |                            |    Token-Level   |      Overall       |   0.811   |  0.713 | 0.759 | --->
+<!--- |                            |                  |      Location      |   0.949   |  0.742 | 0.833 | --->
+<!--- |                            |                  | Epidemiologic Type |    0.9    |  0.917 | 0.908 | --->
+<!--- |                            |                  | Epidemiologic Rate |   0.724   |  0.636 | 0.677 | --->
 Thanks to [@William Kariampuzha](https://github.com/wzkariampuzha) at Axle Informatics/NCATS for contributing this model.