Text Generation
Safetensors
English
lwl-uestc's picture
Update README.md
58fabea verified
|
raw
history blame
2.74 kB
metadata
license: apache-2.0
datasets:
  - FreedomIntelligence/RAG-Instruct
language:
  - en
metrics:
  - accuracy
base_model:
  - meta-llama/Llama-3.1-8B
pipeline_tag: text-generation

⚑ Introduction

RAG-Instruct is a method for generating diverse and high-quality RAG instruction data. It synthesizes instruction datasets based on any source corpus, leveraging the following approaches:

  • Five RAG paradigms, which represent diverse query-document relationships to enhance model generalization across tasks.
  • Instruction simulation, which enriches instruction diversity and quality by utilizing the strengths of existing instruction datasets.

Using this approach, we constructed a 40K instruction dataset from Wikipedia, covering a wide range of RAG scenarios and tasks. Our RAG-Instruct significantly enhances the RAG ability of LLMs, demonstrating remarkable improvements in RAG performance across various tasks.

Model WQA (acc) PQA (acc) TQA (acc) OBQA (EM) Pub (EM) ARC (EM) 2WIKI (acc) HotP (acc) MSQ (acc) CFQA (EM) PubMed (EM)
Llama3.2-3B 58.7 61.8 69.7 77.0 55.0 66.8 55.6 40.2 13.2 46.8 70.3
Llama3.1-8B 59.5 60.8 73.4 82.0 56.7 77.1 65.6 45.6 18.7 56.5 73.9
Llama3.2-3B + RAG-Instruct 65.3 64.0 77.0 81.2 66.4 73.0 72.9 52.7 25.0 50.3 72.6
Llama3.1-8B + RAG-Instruct 69.7 68.4 79.3 84.8 77.2 79.9 79.3 56.4 30.3 57.8 77.0

πŸ“– Citation

@misc{liu2024raginstructboostingllmsdiverse,
      title={RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions}, 
      author={Wanlong Liu and Junying Chen and Ke Ji and Li Zhou and Wenyu Chen and Benyou Wang},
      year={2024},
      eprint={2501.00353},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.00353}, 
}