Svngoku commited on
Commit
d3b0fa6
1 Parent(s): 79b00b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -22
README.md CHANGED
@@ -19,28 +19,20 @@ tags:
19
  datasets:
20
  - lavita/AlpaCare-MedInstruct-52k
21
  metrics:
22
- - name: Correctness
23
- class: deepeval.metrics.GEval
24
- params:
25
- threshold: 0.8
26
- model: "gpt-4o-mini"
27
- criteria: Determine whether the actual output is factually correct based on the expected output, focusing on medical accuracy and adherence to established guidelines.
28
- evaluation_steps:
29
- - Check whether the facts in 'actual output' contradict any facts in 'expected output' or established medical guidelines.
30
- - Heavily penalize omission of critical medical details that could impact patient care or understanding.
31
- - Ensure that medical terminology and language used are precise and appropriate for medical context.
32
- - Assess whether the response adequately addresses the specific medical question posed.
33
- - Vague language or contradicting opinions are acceptable in general contexts, but factual inaccuracies, especially regarding medical data or guidelines, are not.
34
- evaluation_params:
35
- - INPUT
36
- - ACTUAL_OUTPUT
37
- evaluation:
38
- - name: Medical Assistant Evaluation
39
- - description: Evaluate the performance of a medical assistant model on providing detailed explanations about the side effects of Omeprazole.
40
- execution:
41
- - batch_size: 1
42
- - max_length: 1024
43
- - device: cuda
44
  ---
45
 
46
  # Llama-3.1-8B AlpaCare MediInstruct
 
19
  datasets:
20
  - lavita/AlpaCare-MedInstruct-52k
21
  metrics:
22
+ - accuracy
23
+ model-index:
24
+ - name: Llama-3.1-8B-AlpaCare-MedInstruct
25
+ results:
26
+ - task:
27
+ type: text-generation
28
+ dataset:
29
+ name: MedQuAD
30
+ type: MedQuAD
31
+ metrics:
32
+ - name: Medical Q&A
33
+ type: Medical Q&A
34
+ value: 70.00
35
+ ---
 
 
 
 
 
 
 
 
36
  ---
37
 
38
  # Llama-3.1-8B AlpaCare MediInstruct