Triangle104
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,126 @@ tags:
|
|
16 |
This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
17 |
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
## Use with llama.cpp
|
20 |
Install llama.cpp through brew (works on Mac and Linux)
|
21 |
|
|
|
16 |
This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
17 |
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
|
18 |
|
19 |
+
---
|
20 |
+
|
21 |
+
|
22 |
+
The chat template for our models is formatted as:
|
23 |
+
|
24 |
+
|
25 |
+
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
|
26 |
+
|
27 |
+
Or with new lines expanded:
|
28 |
+
|
29 |
+
<|user|>
|
30 |
+
How are you doing?
|
31 |
+
<|assistant|>
|
32 |
+
I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
|
33 |
+
|
34 |
+
It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
|
35 |
+
|
36 |
+
System prompt
|
37 |
+
|
38 |
+
|
39 |
+
|
40 |
+
|
41 |
+
In Ai2 demos, we use this system prompt by default:
|
42 |
+
|
43 |
+
|
44 |
+
You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
|
45 |
+
|
46 |
+
The model has not been trained with a specific system prompt in mind.
|
47 |
+
|
48 |
+
Bias, Risks, and Limitations
|
49 |
+
|
50 |
+
|
51 |
+
|
52 |
+
|
53 |
+
The Tülu3 models have limited safety training, but are not deployed
|
54 |
+
automatically with in-the-loop filtering of responses like ChatGPT, so
|
55 |
+
the model can produce problematic outputs (especially when prompted to
|
56 |
+
do so).
|
57 |
+
It is also unknown what the size and composition of the corpus was used
|
58 |
+
to train the base Llama 3.1 models, however it is likely to have
|
59 |
+
included a mix of Web data and technical sources like books and code.
|
60 |
+
See the Falcon 180B model card for an example of this.
|
61 |
+
|
62 |
+
Hyperparamters
|
63 |
+
|
64 |
+
PPO settings for RLVR:
|
65 |
+
|
66 |
+
Learning Rate: 3 × 10⁻⁷
|
67 |
+
Discount Factor (gamma): 1.0
|
68 |
+
General Advantage Estimation (lambda): 0.95
|
69 |
+
Mini-batches (N_mb): 1
|
70 |
+
PPO Update Iterations (K): 4
|
71 |
+
PPO's Clipping Coefficient (epsilon): 0.2
|
72 |
+
Value Function Coefficient (c1): 0.1
|
73 |
+
Gradient Norm Threshold: 1.0
|
74 |
+
Learning Rate Schedule: Linear
|
75 |
+
Generation Temperature: 1.0
|
76 |
+
Batch Size (effective): 512
|
77 |
+
Max Token Length: 2,048
|
78 |
+
Max Prompt Token Length: 2,048
|
79 |
+
Penalty Reward Value for Responses without an EOS Token: -10.0
|
80 |
+
Response Length: 1,024 (but 2,048 for MATH)
|
81 |
+
Total Episodes: 100,000
|
82 |
+
KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
|
83 |
+
Warm up ratio (omega): 0.0
|
84 |
+
|
85 |
+
License and use
|
86 |
+
|
87 |
+
|
88 |
+
|
89 |
+
|
90 |
+
All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
|
91 |
+
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
|
92 |
+
Tülu3 is intended for research and educational use.
|
93 |
+
For more information, please see our Responsible Use Guidelines.
|
94 |
+
|
95 |
+
|
96 |
+
The models have been fine-tuned using a dataset mix with outputs
|
97 |
+
generated from third party models and are subject to additional terms:
|
98 |
+
Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
|
99 |
+
|
100 |
+
Citation
|
101 |
+
|
102 |
+
|
103 |
+
|
104 |
+
|
105 |
+
If Tülu3 or any of the related materials were helpful to your work, please cite:
|
106 |
+
|
107 |
+
@article{lambert2024tulu3,
|
108 |
+
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
|
109 |
+
author = {
|
110 |
+
Nathan Lambert and
|
111 |
+
Jacob Morrison and
|
112 |
+
Valentina Pyatkin and
|
113 |
+
Shengyi Huang and
|
114 |
+
Hamish Ivison and
|
115 |
+
Faeze Brahman and
|
116 |
+
Lester James V. Miranda and
|
117 |
+
Alisa Liu and
|
118 |
+
Nouha Dziri and
|
119 |
+
Shane Lyu and
|
120 |
+
Yuling Gu and
|
121 |
+
Saumya Malik and
|
122 |
+
Victoria Graf and
|
123 |
+
Jena D. Hwang and
|
124 |
+
Jiangjiang Yang and
|
125 |
+
Ronan Le Bras and
|
126 |
+
Oyvind Tafjord and
|
127 |
+
Chris Wilhelm and
|
128 |
+
Luca Soldaini and
|
129 |
+
Noah A. Smith and
|
130 |
+
Yizhong Wang and
|
131 |
+
Pradeep Dasigi and
|
132 |
+
Hannaneh Hajishirzi
|
133 |
+
},
|
134 |
+
year = {2024},
|
135 |
+
email = {tulu@allenai.org}
|
136 |
+
}
|
137 |
+
|
138 |
+
---
|
139 |
## Use with llama.cpp
|
140 |
Install llama.cpp through brew (works on Mac and Linux)
|
141 |
|