Yutian010313 commited on
Commit
6bb387e
·
verified ·
1 Parent(s): 7a08336

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -23
README.md CHANGED
@@ -13,9 +13,10 @@ license: apache-2.0
13
  <!-- Provide a longer summary of what this model is. -->
14
 
15
 
16
- - **Discription:** These are the LoRA weights obtained by training with synthesized iterative retrieval instruction data. Details can be found in our paper.
17
  - **Developed by:** ICTNLP Group. Authors: Tian Yu, Shaolei Zhang and Yang Feng.
18
  - **Github Repository:** https://github.com/ictnlp/Auto-RAG
 
19
  - **Finetuned from model:** Meta-Llama3-8B-Instruct
20
 
21
 
@@ -23,31 +24,17 @@ license: apache-2.0
23
 
24
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
25
 
26
- Merge the Meta-Llama3-8B-Instruct weights and Adapter weights.
27
-
28
  ```
29
- import os
30
- from transformers import AutoTokenizer, LlamaForCausalLM
31
- import torch
32
-
33
- model = LlamaForCausalLM.from_pretrained(PATH_TO_META_LLAMA3_8B_INSTRUCT,
34
- device_map="cpu",
35
- )
36
- from peft import PeftModel
37
-
38
- model = PeftModel.from_pretrained(model,
39
- PATH_TO_ADAPTER)
40
-
41
- from transformers import AutoTokenizer
42
- tokenizer = AutoTokenizer.from_pretrained(PATH_TO_META_LLAMA3_8B_INSTRUCT)
43
-
44
- model = model.merge_and_unload()
45
- model.save_pretrained(SAVE_PATH)
46
- tokenizer.save_pretrained(SAVE_PATH)
47
  ```
48
 
49
- Subsequently, you can deploy using frameworks such as vllm.
50
-
51
  ## Citation
52
 
53
  ```
 
13
  <!-- Provide a longer summary of what this model is. -->
14
 
15
 
16
+ - **Discription:** This is Auto-RAG model trained with synthesized iterative retrieval instruction data. Details can be found in our paper.
17
  - **Developed by:** ICTNLP Group. Authors: Tian Yu, Shaolei Zhang and Yang Feng.
18
  - **Github Repository:** https://github.com/ictnlp/Auto-RAG
19
+ - **Paper Link:** https://arxiv.org/abs/2411.19443
20
  - **Finetuned from model:** Meta-Llama3-8B-Instruct
21
 
22
 
 
24
 
25
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
26
 
27
+ You can directly deploy the model using vllm, such as:
 
28
  ```
29
+ CUDA_VISIBLE_DEVICES=6,7 python -m vllm.entrypoints.openai.api_server \
30
+ --model PATH_TO_MODEL\
31
+ --gpu-memory-utilization 0.9 \
32
+ -tp 2 \
33
+ --max-model-len 8192\
34
+ --port 8000\
35
+ --host 0.0.0.0
 
 
 
 
 
 
 
 
 
 
 
36
  ```
37
 
 
 
38
  ## Citation
39
 
40
  ```