ICTNLP
/

Auto-RAG-Llama-3-8B-Instruct

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Yutian010313 commited on Dec 3, 2024

Commit

6bb387e

·

verified ·

1 Parent(s): 7a08336

Update README.md

Files changed (1) hide show

README.md +10 -23

README.md CHANGED Viewed

@@ -13,9 +13,10 @@ license: apache-2.0
 <!-- Provide a longer summary of what this model is. -->
-- **Discription:** These are the LoRA weights obtained by training with synthesized iterative retrieval instruction data. Details can be found in our paper.
 - **Developed by:** ICTNLP Group. Authors: Tian Yu, Shaolei Zhang and Yang Feng.
 - **Github Repository:** https://github.com/ictnlp/Auto-RAG
 - **Finetuned from model:** Meta-Llama3-8B-Instruct
@@ -23,31 +24,17 @@ license: apache-2.0
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-Merge the Meta-Llama3-8B-Instruct weights and Adapter weights.
 ```
-import os
-from transformers import AutoTokenizer, LlamaForCausalLM
-import torch
-model = LlamaForCausalLM.from_pretrained(PATH_TO_META_LLAMA3_8B_INSTRUCT,
-                                  device_map="cpu",
-                                  )
-from peft import PeftModel
-model = PeftModel.from_pretrained(model,
-                                  PATH_TO_ADAPTER)
-from transformers import AutoTokenizer
-tokenizer = AutoTokenizer.from_pretrained(PATH_TO_META_LLAMA3_8B_INSTRUCT)
-model = model.merge_and_unload()
-model.save_pretrained(SAVE_PATH)
-tokenizer.save_pretrained(SAVE_PATH)
 ```
-Subsequently, you can deploy using frameworks such as vllm.
 ## Citation
 ```

 <!-- Provide a longer summary of what this model is. -->
+- **Discription:** This is Auto-RAG model trained with synthesized iterative retrieval instruction data. Details can be found in our paper.
 - **Developed by:** ICTNLP Group. Authors: Tian Yu, Shaolei Zhang and Yang Feng.
 - **Github Repository:** https://github.com/ictnlp/Auto-RAG
+- **Paper Link:** https://arxiv.org/abs/2411.19443
 - **Finetuned from model:** Meta-Llama3-8B-Instruct
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+You can directly deploy the model using vllm, such as:
 ```
+CUDA_VISIBLE_DEVICES=6,7 python -m vllm.entrypoints.openai.api_server \
+    --model PATH_TO_MODEL\
+    --gpu-memory-utilization 0.9 \
+    -tp 2 \
+    --max-model-len 8192\
+    --port 8000\
+    --host 0.0.0.0
 ```
 ## Citation
 ```