zhangce commited on
Commit
5852170
·
1 Parent(s): 0c964a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -10,11 +10,11 @@ language:
10
  library_name: transformers
11
  ---
12
 
13
- # Llama-2-7B-32K-beta
14
 
15
  ## Model Description
16
 
17
- Llama-2-7B-32K-beta is an open-source, long context language model developed by Together, fine-tuned from Meta's original Llama-2 7B model.
18
  This model represents our efforts to contribute to the rapid progress of the open-source ecosystem for large language models.
19
  The model has been extended to a context length of 32K with position interpolation,
20
  allowing applications on multi-document QA, long text summarization, etc.
@@ -44,7 +44,7 @@ To enhance the long-context ability, we exclude data shorter than 2K word. The i
44
 
45
  Next, we provide examples of how to fine-tune the model for specific applications.
46
  The example datasets are placed in [togethercomputer/Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
47
- You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over Llama-2-7B-32K-beta.
48
  Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
49
 
50
  1. Long Context QA.
@@ -68,7 +68,7 @@ Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) f
68
 
69
  ## Inference
70
 
71
- You can use the [Together API](https://together.ai/blog/api-announcement) to try out Llama-2-7B-32K-beta for inference.
72
  The updated inference stack allows for efficient inference.
73
 
74
  To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
@@ -87,8 +87,8 @@ You can use this model directly from the Hugging Face Model Hub or fine-tune it
87
  ```python
88
  from transformers import AutoTokenizer, AutoModelForCausalLM
89
 
90
- tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-beta")
91
- model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-beta", trust_remote_code=True, torch_dtype=torch.float16)
92
 
93
  input_context = "Your text here"
94
  input_ids = tokenizer.encode(input_context, return_tensors="pt")
@@ -102,7 +102,7 @@ Alternatively, you can set `trust_remote_code=False` if you prefer not to use fl
102
 
103
  ## Limitations and Bias
104
 
105
- As with all language models, Llama-2-7B-32K-beta may generate incorrect or biased content. It's important to keep this in mind when using the model.
106
 
107
  ## Community
108
 
 
10
  library_name: transformers
11
  ---
12
 
13
+ # LLaMA-2-7B-32K
14
 
15
  ## Model Description
16
 
17
+ LLaMA-2-7B-32K is an open-source, long context language model developed by Together, fine-tuned from Meta's original Llama-2 7B model.
18
  This model represents our efforts to contribute to the rapid progress of the open-source ecosystem for large language models.
19
  The model has been extended to a context length of 32K with position interpolation,
20
  allowing applications on multi-document QA, long text summarization, etc.
 
44
 
45
  Next, we provide examples of how to fine-tune the model for specific applications.
46
  The example datasets are placed in [togethercomputer/Long-Data-Collections](https://huggingface.co/datasets/togethercomputer/Long-Data-Collections)
47
+ You can use the [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) to fine-tune your own 32K model over LLaMA-2-7B-32K.
48
  Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) for step-by-step illustrations.
49
 
50
  1. Long Context QA.
 
68
 
69
  ## Inference
70
 
71
+ You can use the [Together API](https://together.ai/blog/api-announcement) to try out LLaMA-2-7B-32K for inference.
72
  The updated inference stack allows for efficient inference.
73
 
74
  To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
 
87
  ```python
88
  from transformers import AutoTokenizer, AutoModelForCausalLM
89
 
90
+ tokenizer = AutoTokenizer.from_pretrained("togethercomputer/LLaMA-2-7B-32K")
91
+ model = AutoModelForCausalLM.from_pretrained("togethercomputer/LLaMA-2-7B-32K", trust_remote_code=True, torch_dtype=torch.float16)
92
 
93
  input_context = "Your text here"
94
  input_ids = tokenizer.encode(input_context, return_tensors="pt")
 
102
 
103
  ## Limitations and Bias
104
 
105
+ As with all language models, LLaMA-2-7B-32K may generate incorrect or biased content. It's important to keep this in mind when using the model.
106
 
107
  ## Community
108