Triangle104 commited on
Commit
754a3b9
·
verified ·
1 Parent(s): 5e2f827

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md CHANGED
@@ -16,6 +16,126 @@ tags:
16
  This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Use with llama.cpp
20
  Install llama.cpp through brew (works on Mac and Linux)
21
 
 
16
  This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
17
  Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
18
 
19
+ ---
20
+
21
+
22
+ The chat template for our models is formatted as:
23
+
24
+
25
+ <|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
26
+
27
+ Or with new lines expanded:
28
+
29
+ <|user|>
30
+ How are you doing?
31
+ <|assistant|>
32
+ I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
33
+
34
+ It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
35
+
36
+ System prompt
37
+
38
+
39
+
40
+
41
+ In Ai2 demos, we use this system prompt by default:
42
+
43
+
44
+ You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
45
+
46
+ The model has not been trained with a specific system prompt in mind.
47
+
48
+ Bias, Risks, and Limitations
49
+
50
+
51
+
52
+
53
+ The Tülu3 models have limited safety training, but are not deployed
54
+ automatically with in-the-loop filtering of responses like ChatGPT, so
55
+ the model can produce problematic outputs (especially when prompted to
56
+ do so).
57
+ It is also unknown what the size and composition of the corpus was used
58
+ to train the base Llama 3.1 models, however it is likely to have
59
+ included a mix of Web data and technical sources like books and code.
60
+ See the Falcon 180B model card for an example of this.
61
+
62
+ Hyperparamters
63
+
64
+ PPO settings for RLVR:
65
+
66
+ Learning Rate: 3 × 10⁻⁷
67
+ Discount Factor (gamma): 1.0
68
+ General Advantage Estimation (lambda): 0.95
69
+ Mini-batches (N_mb): 1
70
+ PPO Update Iterations (K): 4
71
+ PPO's Clipping Coefficient (epsilon): 0.2
72
+ Value Function Coefficient (c1): 0.1
73
+ Gradient Norm Threshold: 1.0
74
+ Learning Rate Schedule: Linear
75
+ Generation Temperature: 1.0
76
+ Batch Size (effective): 512
77
+ Max Token Length: 2,048
78
+ Max Prompt Token Length: 2,048
79
+ Penalty Reward Value for Responses without an EOS Token: -10.0
80
+ Response Length: 1,024 (but 2,048 for MATH)
81
+ Total Episodes: 100,000
82
+ KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
83
+ Warm up ratio (omega): 0.0
84
+
85
+ License and use
86
+
87
+
88
+
89
+
90
+ All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
91
+ Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
92
+ Tülu3 is intended for research and educational use.
93
+ For more information, please see our Responsible Use Guidelines.
94
+
95
+
96
+ The models have been fine-tuned using a dataset mix with outputs
97
+ generated from third party models and are subject to additional terms:
98
+ Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
99
+
100
+ Citation
101
+
102
+
103
+
104
+
105
+ If Tülu3 or any of the related materials were helpful to your work, please cite:
106
+
107
+ @article{lambert2024tulu3,
108
+ title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
109
+ author = {
110
+ Nathan Lambert and
111
+ Jacob Morrison and
112
+ Valentina Pyatkin and
113
+ Shengyi Huang and
114
+ Hamish Ivison and
115
+ Faeze Brahman and
116
+ Lester James V. Miranda and
117
+ Alisa Liu and
118
+ Nouha Dziri and
119
+ Shane Lyu and
120
+ Yuling Gu and
121
+ Saumya Malik and
122
+ Victoria Graf and
123
+ Jena D. Hwang and
124
+ Jiangjiang Yang and
125
+ Ronan Le Bras and
126
+ Oyvind Tafjord and
127
+ Chris Wilhelm and
128
+ Luca Soldaini and
129
+ Noah A. Smith and
130
+ Yizhong Wang and
131
+ Pradeep Dasigi and
132
+ Hannaneh Hajishirzi
133
+ },
134
+ year = {2024},
135
+ email = {tulu@allenai.org}
136
+ }
137
+
138
+ ---
139
  ## Use with llama.cpp
140
  Install llama.cpp through brew (works on Mac and Linux)
141