Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language models

Tian Yu, Shaolei Zhang, and Yang Feng*

Model Details

  • Discription: This is Auto-RAG model trained with synthesized iterative retrieval instruction data. Details can be found in our paper.
  • Developed by: ICTNLP Group. Authors: Tian Yu, Shaolei Zhang and Yang Feng.
  • Github Repository: https://github.com/ictnlp/Auto-RAG
  • Paper Link: https://arxiv.org/abs/2411.19443
  • Finetuned from model: Meta-Llama3-8B-Instruct

Uses

You can directly deploy the model using vllm, such as:

CUDA_VISIBLE_DEVICES=6,7 python -m vllm.entrypoints.openai.api_server \
    --model PATH_TO_MODEL\
    --gpu-memory-utilization 0.9 \
    -tp 2 \
    --max-model-len 8192\
    --port 8000\
    --host 0.0.0.0

Citation

@article{yu2024autorag,
      title={Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models}, 
      author={Tian Yu and Shaolei Zhang and Yang Feng},
      year={2024},
      eprint={2411.19443},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.19443}, 
}
Downloads last month
68
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ICTNLP/Auto-RAG-Llama-3-8B-Instruct

Quantizations
2 models