m42-health
/

Llama3-Med42-70B

Text Generation

text-generation-inference

Model card Files Files and versions Community

cchristophe commited on Jul 2, 2024

Commit

fbca64b

·

verified ·

1 Parent(s): e234470

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -163,6 +163,7 @@ Which response is of higher overall quality in a medical context? Consider:
 [Include Image]
 ### MCQA Evaluation
 Med42-v2 improves performance on every clinical benchmark compared to our previous version, including MedQA, MedMCQA, USMLE, MMLU clinical topics and MMLU Pro clinical subset. For all evaluations reported so far, we use [EleutherAI's evaluation harness library](https://github.com/EleutherAI/lm-evaluation-harness) and report zero-shot accuracies (except otherwise stated). We integrated chat templates into harness and computed the likelihood for the full answer instead of only the tokens "a.", "b.", "c." or "d.".

 [Include Image]
 ### MCQA Evaluation
 Med42-v2 improves performance on every clinical benchmark compared to our previous version, including MedQA, MedMCQA, USMLE, MMLU clinical topics and MMLU Pro clinical subset. For all evaluations reported so far, we use [EleutherAI's evaluation harness library](https://github.com/EleutherAI/lm-evaluation-harness) and report zero-shot accuracies (except otherwise stated). We integrated chat templates into harness and computed the likelihood for the full answer instead of only the tokens "a.", "b.", "c." or "d.".