bond005
/

xlm-roberta-xl-hallucination-detector

@@ -6,18 +6,39 @@ license: apache-2.0
 pipeline_tag: text-classification
 ---
-# LLM hallucination detector
-The LLM hallucination detector based on the hierarchical [XLM-RoBERTa-XL](https://huggingface.co/facebook/xlm-roberta-xl) was developed to participate in the [SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes](https://helsinki-nlp.github.io/shroom) (model-agnostic track).
 ## Model description
-Text...
 ## Intended uses & limitations
 This model is primarily aimed at being reference-based detected of hallucination in LLM without any additional information about LLM type and architecture (i.e. in model-agnostic mode). The reference-based detection means that the hallucination detector considers not only the human question and the answer generated by the verified LLM, but also the reference answer to the human question. Therefore, in a situation where the reference answer is not known, this hallucination detector is not applicable. But in some cases (for example, when we analyze the LLM's responses on an annotated test set and want to separate hallucinations from usual errors such as undergeneration, errors related to part of speech, and so on), we know information about the standards, and then the proposed detector will be extremely useful.
 ## Usage
 You need to install the [pytorch-metric-library](https://github.com/KevinMusgrave/pytorch-metric-learning) to use this model. After that, you can use this model directly with a pipeline for text classification:
@@ -64,6 +85,9 @@ def sample_to_str(sample: Dict[str, str]) -> str:
 # The input data format is based on data for the model-agnostic track of SHROOM
 # https://helsinki-nlp.github.io/shroom
 input_data = [
     {
         "hyp": "Resembling or characteristic of a weasel.",
@@ -185,4 +209,19 @@ PREDICTED
     p(Hallucination): 0.487
 ```
-The Google Colaboratory version of [this script](https://colab.research.google.com/drive/1T5LOuYfLNI3bqz6W-Y6kEajk3SumxyqU?usp=sharing) is available too.

 pipeline_tag: text-classification
 ---
+# The reference-based LLM hallucination detector
+The LLM hallucination detector based on the self-adaptive hierarchical [XLM-RoBERTa-XL](https://huggingface.co/facebook/xlm-roberta-xl) was developed to participate in the [SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes](https://helsinki-nlp.github.io/shroom) (model-agnostic track).
 ## Model description
+This model was component of my solution for the [SemEval-2024 Task-6 - SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes](https://helsinki-nlp.github.io/shroom). The competition goal was to develop of the best algorithm to detect a LLM's hallucination, i.e. grammatically sound output that contains incorrect semantic information (unsupported or inconsistent with the source input). The competition organizers prepared two different setups:
+- the model-aware track, where the developed detector can have access to the LLM that produced the output;
+- the model-agnostic track, where the developed detector have not any access to the verified LLM, and it uses only the source input of this LLM and generated output.
+This model was designed to detect hallucinations in the model-agnostic track. It was based on **fourth simple ideas**:
+1. **The detector is a Transformer-encoder** based on [XLM-RoBERTa-XL](https://huggingface.co/facebook/xlm-roberta-xl). The hallucination detection is a binary classification problem, and we don't need any decoder like as [SelfCheckGPT](https://arxiv.org/abs/2303.08896) to solve this problem. Make encoder great again!
+2. **The prompt engineering matters for encoders too** (not for decoders only). A specially designed text prompt is a good inductive bias for a text classifier.
+3. **The text classifier needs a self-adaptive hierarchy of encoder's hidden layers**. The classifier does not have to be built on top of the last hidden layer. Perhaps one of the earlier hidden layers would be more useful. We don't know this for sure, so we use a special gating network to automatically estimate the importance of encoder's various hidden layers during training.
+4. **A two-stage fine-tuning is all you need**. At the first stage we fine-tune our self-adaptive hierarchical encoder as a sentence embedder using contrastive learning. At the second stage we fine-tune this model as a usual classifier from the embedder's checkpoint. This approach was proposed in the paper "Contrastive fine-tuning to improve generalization in deep NER" (DOI: [10.28995/2075-7182-2022-21-70-80](https://www.dialog-21.ru/media/5751/bondarenkoi113.pdf)), but it works for other NLU tasks too.
 ## Intended uses & limitations
 This model is primarily aimed at being reference-based detected of hallucination in LLM without any additional information about LLM type and architecture (i.e. in model-agnostic mode). The reference-based detection means that the hallucination detector considers not only the human question and the answer generated by the verified LLM, but also the reference answer to the human question. Therefore, in a situation where the reference answer is not known, this hallucination detector is not applicable. But in some cases (for example, when we analyze the LLM's responses on an annotated test set and want to separate hallucinations from usual errors such as undergeneration, errors related to part of speech, and so on), we know information about the standards, and then the proposed detector will be extremely useful.
+This model is capable of detecting LLM hallucinations that occur when solving the following NLG tasks: paraphrase generation, machine translation, and definition modeling.
+## Evaluation
+The final ranking of all model-agnostic solutions on the test data is available in [the ranking agnostic CSV file](https://drive.google.com/file/d/1J-iRpZ1LQSkTnTKJaSFfJHa5zxoHd__K/view?usp=sharing) on the [SHROOM web-page](https://helsinki-nlp.github.io/shroom). The accuracy score of my solution is 0.77, and it ranks 28th out of 49. The abovementioned model is a component of my solution, and the accuracy of this model as an independent algorithm is 0.7153. For comparison, the accuracy of the baseline system based on SelfCheckGPT is 0.6967.
 ## Usage
 You need to install the [pytorch-metric-library](https://github.com/KevinMusgrave/pytorch-metric-learning) to use this model. After that, you can use this model directly with a pipeline for text classification:
 # The input data format is based on data for the model-agnostic track of SHROOM
 # https://helsinki-nlp.github.io/shroom
+# "src" is a verified LLM's input to start generation
+# "hyp" is an output generated by this LLM
+# "tgt" is a reference output from the point of view of human assessors
 input_data = [
     {
         "hyp": "Resembling or characteristic of a weasel.",
     p(Hallucination): 0.487
 ```
+The Google Colaboratory version of [this script](https://colab.research.google.com/drive/1T5LOuYfLNI3bqz6W-Y6kEajk3SumxyqU?usp=sharing) is available too.
+## Citation
+If you want to cite this model you can use this:
+```bibtex
+@misc{bondarenko2024hallucination,
+  title={The reference-based detector of LLM hallucinations by Ivan Bondarenko},
+  author={Bondarenko, Ivan},
+  publisher={Hugging Face},
+  journal={Hugging Face Hub},
+  howpublished={\url{https://huggingface.co/bond005/xlm-roberta-xl-hallucination-detector}},
+  year={2024}
+}
+```