m42-health
/

Llama3-Med42-70B

Text Generation

text-generation-inference

Model card Files Files and versions Community

pkanithi commited on Jul 2

Commit

6d2f66d

•

1 Parent(s): 1b33683

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -131,7 +131,7 @@ The training was conducted on the NVIDIA DGX cluster with H100 GPUs, utilizing P
 ### Open-ended question generation
-To ensure a robust evaluation of our model's output quality, we employ the LLM-as-a-Judge approach using Prometheus-8x7b-v2.0. Our assessment uses carefully curated 4,000 publicly accessible healthcare-related questions, generating responses from various models using the same prompt. We then use Prometheus to conduct pairwise comparisons of the answers. Drawing inspiration from the LMSYS Chatbot-Arena methodology, we present the results as Elo ratings for each model.
 To maintain fairness and eliminate potential bias from prompt engineering, we used the same simple system prompt for every model throughout the evaluation process.

 ### Open-ended question generation
+To ensure a robust evaluation of our model's output quality, we employ the LLM-as-a-Judge approach using Prometheus-8x7b-v2.0. Our assessment uses carefully curated 4,000 publicly accessible healthcare-related questions, generating responses from various models. We then use Prometheus to conduct pairwise comparisons of the answers. Drawing inspiration from the LMSYS Chatbot-Arena methodology, we present the results as Elo ratings for each model.
 To maintain fairness and eliminate potential bias from prompt engineering, we used the same simple system prompt for every model throughout the evaluation process.