Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ tags:
|
|
11 |
|
12 |
# Model Card for Model ID
|
13 |
|
14 |
-
Lynx is an open-source
|
15 |
The datasets contain a mix of hand-annotated and synthetic data. The maximum sequence length is 8000 tokens.
|
16 |
|
17 |
|
@@ -20,14 +20,14 @@ The datasets contain a mix of hand-annotated and synthetic data. The maximum seq
|
|
20 |
- **Model Type:** Patronus-Lynx-8B-Instruct is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct model.
|
21 |
- **Language:** Primarily English
|
22 |
- **Developed by:** Patronus AI
|
23 |
-
- **License:** [
|
24 |
|
25 |
### Model Sources [optional]
|
26 |
|
27 |
<!-- Provide the basic links for the model. -->
|
28 |
|
29 |
-
- **Repository:** [
|
30 |
-
|
31 |
|
32 |
## How to Get Started with the Model
|
33 |
The model is fine-tuned to be used to detect faithfulness in a RAG setting. Provided a document, question and answer, the model can evaluate whether the answer is faithful to the document.
|
@@ -66,17 +66,15 @@ The model was finetuned for 3 epochs using H100s on dataset of size 2400. We use
|
|
66 |
|
67 |
### Training Data
|
68 |
|
69 |
-
|
70 |
-
|
71 |
-
[More Information Needed]
|
72 |
|
|
|
73 |
|
74 |
## Evaluation
|
75 |
|
76 |
-
The model was evaluated on [PatronusAI/hallucination-evaluation-benchmark](https://huggingface.co/datasets/PatronusAI/hallucination-evaluation-benchmark)
|
77 |
-
|
78 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
79 |
|
|
|
80 |
|
81 |
## Citation [optional]
|
82 |
|
|
|
11 |
|
12 |
# Model Card for Model ID
|
13 |
|
14 |
+
Lynx is an open-source hallucination evaluation model. Patronus-Lynx-8B-Instruct was trained on a mix of datasets such as CovidQA, PubmedQA, DROP, FinanceBench.
|
15 |
The datasets contain a mix of hand-annotated and synthetic data. The maximum sequence length is 8000 tokens.
|
16 |
|
17 |
|
|
|
20 |
- **Model Type:** Patronus-Lynx-8B-Instruct is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct model.
|
21 |
- **Language:** Primarily English
|
22 |
- **Developed by:** Patronus AI
|
23 |
+
- **License:** [https://llama.meta.com/llama3/license](https://llama.meta.com/llama3/license)
|
24 |
|
25 |
### Model Sources [optional]
|
26 |
|
27 |
<!-- Provide the basic links for the model. -->
|
28 |
|
29 |
+
- **Repository:** [https://github.com/patronus-ai/Lynx-hallucination-detection](https://github.com/patronus-ai/Lynx-hallucination-detection)
|
30 |
+
|
31 |
|
32 |
## How to Get Started with the Model
|
33 |
The model is fine-tuned to be used to detect faithfulness in a RAG setting. Provided a document, question and answer, the model can evaluate whether the answer is faithful to the document.
|
|
|
66 |
|
67 |
### Training Data
|
68 |
|
69 |
+
We train on 2400 samples consisting of CovidQA, PubmedQA, DROP and RAGTruth samples. For datasets that do not contain hallucinated samples, we generate perturbations to introduce hallucinations in the data. For more details about the data generation process, refer to the paper.
|
|
|
|
|
70 |
|
71 |
+
The training data can be found here: [PatronusAI/drop-RAGTruth-covidqa-pubmed](https://huggingface.co/datasets/PatronusAI/drop-RAGTruth-covidqa-pubmed)
|
72 |
|
73 |
## Evaluation
|
74 |
|
75 |
+
The model was evaluated on [PatronusAI/hallucination-evaluation-benchmark](https://huggingface.co/datasets/PatronusAI/hallucination-evaluation-benchmark).
|
|
|
|
|
76 |
|
77 |
+
It outperforms GPT-3.5-Turbo, GPT-4-Turbo, GPT-4o and Claude Sonnet.
|
78 |
|
79 |
## Citation [optional]
|
80 |
|