|
--- |
|
license: llama3.1 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
datasets: |
|
- allenai/RLVR-GSM-MATH-IF-Mixed-Constraints |
|
base_model: allenai/Llama-3.1-Tulu-3-8B |
|
library_name: transformers |
|
tags: |
|
- llama-cpp |
|
- gguf-my-repo |
|
--- |
|
|
|
# Triangle104/Llama-3.1-Tulu-3-8B-Q5_K_M-GGUF |
|
This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space. |
|
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model. |
|
|
|
--- |
|
Model details: |
|
- |
|
Tülu3 is a leading instruction following model family, offering fully |
|
open-source data, code, and recipes designed to serve as a |
|
comprehensive guide for modern post-training techniques. |
|
Tülu3 is designed for state-of-the-art performance on a diversity of |
|
tasks in addition to chat, such as MATH, GSM8K, and IFEval. |
|
|
|
|
|
Model description |
|
|
|
|
|
|
|
Model type: A model trained on a mix of publicly available, synthetic and human-created datasets. |
|
Language(s) (NLP): Primarily English |
|
License: Llama 3.1 Community License Agreement |
|
Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO |
|
|
|
|
|
Model Sources |
|
|
|
|
|
|
|
Training Repository: https://github.com/allenai/open-instruct |
|
Eval Repository: https://github.com/allenai/olmes |
|
Paper: https://arxiv.org/abs/2411.15124 |
|
Demo: https://playground.allenai.org/ |
|
|
|
|
|
Using the model |
|
|
|
|
|
Loading with HuggingFace |
|
|
|
|
|
|
|
To load the model with HuggingFace, use the following snippet: |
|
|
|
|
|
from transformers import AutoModelForCausalLM |
|
|
|
|
|
tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B") |
|
|
|
|
|
VLLM |
|
|
|
|
|
|
|
As a Llama base model, the model can be easily served with: |
|
|
|
|
|
vllm serve allenai/Llama-3.1-Tulu-3-8B |
|
|
|
|
|
Note that given the long chat template of Llama, you may want to use --max_model_len=8192. |
|
|
|
|
|
Chat template |
|
|
|
|
|
|
|
The chat template for our models is formatted as: |
|
|
|
|
|
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a |
|
computer program, so I don't have feelings, but I'm functioning as |
|
expected. How can I assist you today?<|endoftext|> |
|
|
|
|
|
Or with new lines expanded: |
|
|
|
|
|
<|user|> |
|
How are you doing? |
|
<|assistant|> |
|
I'm just a computer program, so I don't have feelings, but I'm |
|
functioning as expected. How can I assist you today?<|endoftext|> |
|
|
|
|
|
It is embedded within the tokenizer as well, for tokenizer.apply_chat_template. |
|
|
|
|
|
System prompt |
|
|
|
|
|
|
|
In Ai2 demos, we use this system prompt by default: |
|
|
|
|
|
You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI. |
|
|
|
|
|
The model has not been trained with a specific system prompt in mind. |
|
|
|
|
|
Bias, Risks, and Limitations |
|
|
|
|
|
|
|
The Tülu3 models have limited safety training, but are not deployed |
|
automatically with in-the-loop filtering of responses like ChatGPT, so |
|
the model can produce problematic outputs (especially when prompted to |
|
do so). |
|
It is also unknown what the size and composition of the corpus was used |
|
to train the base Llama 3.1 models, however it is likely to have |
|
included a mix of Web data and technical sources like books and code. |
|
See the Falcon 180B model card for an example of this. |
|
|
|
|
|
Hyperparamters |
|
|
|
|
|
PPO settings for RLVR: |
|
|
|
|
|
Learning Rate: 3 × 10⁻⁷ |
|
Discount Factor (gamma): 1.0 |
|
General Advantage Estimation (lambda): 0.95 |
|
Mini-batches (N_mb): 1 |
|
PPO Update Iterations (K): 4 |
|
PPO's Clipping Coefficient (epsilon): 0.2 |
|
Value Function Coefficient (c1): 0.1 |
|
Gradient Norm Threshold: 1.0 |
|
Learning Rate Schedule: Linear |
|
Generation Temperature: 1.0 |
|
Batch Size (effective): 512 |
|
Max Token Length: 2,048 |
|
Max Prompt Token Length: 2,048 |
|
Penalty Reward Value for Responses without an EOS Token: -10.0 |
|
Response Length: 1,024 (but 2,048 for MATH) |
|
Total Episodes: 100,000 |
|
KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01] |
|
Warm up ratio (omega): 0.0 |
|
|
|
|
|
License and use |
|
|
|
|
|
|
|
All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement. |
|
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. |
|
Tülu3 is intended for research and educational use. |
|
For more information, please see our Responsible Use Guidelines. |
|
|
|
|
|
The models have been fine-tuned using a dataset mix with outputs |
|
generated from third party models and are subject to additional terms: |
|
Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5). |
|
|
|
|
|
Citation |
|
|
|
|
|
|
|
If Tülu3 or any of the related materials were helpful to your work, please cite: |
|
|
|
|
|
@article{lambert2024tulu3, |
|
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training}, |
|
author = { |
|
Nathan Lambert and |
|
Jacob Morrison and |
|
Valentina Pyatkin and |
|
Shengyi Huang and |
|
Hamish Ivison and |
|
Faeze Brahman and |
|
Lester James V. Miranda and |
|
Alisa Liu and |
|
Nouha Dziri and |
|
Shane Lyu and |
|
Yuling Gu and |
|
Saumya Malik and |
|
Victoria Graf and |
|
Jena D. Hwang and |
|
Jiangjiang Yang and |
|
Ronan Le Bras and |
|
Oyvind Tafjord and |
|
Chris Wilhelm and |
|
Luca Soldaini and |
|
Noah A. Smith and |
|
Yizhong Wang and |
|
Pradeep Dasigi and |
|
Hannaneh Hajishirzi |
|
}, |
|
year = {2024}, |
|
email = {tulu@allenai.org} |
|
} |
|
|
|
--- |
|
## Use with llama.cpp |
|
Install llama.cpp through brew (works on Mac and Linux) |
|
|
|
```bash |
|
brew install llama.cpp |
|
|
|
``` |
|
Invoke the llama.cpp server or the CLI. |
|
|
|
### CLI: |
|
```bash |
|
llama-cli --hf-repo Triangle104/Llama-3.1-Tulu-3-8B-Q5_K_M-GGUF --hf-file llama-3.1-tulu-3-8b-q5_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
|
|
### Server: |
|
```bash |
|
llama-server --hf-repo Triangle104/Llama-3.1-Tulu-3-8B-Q5_K_M-GGUF --hf-file llama-3.1-tulu-3-8b-q5_k_m.gguf -c 2048 |
|
``` |
|
|
|
Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. |
|
|
|
Step 1: Clone llama.cpp from GitHub. |
|
``` |
|
git clone https://github.com/ggerganov/llama.cpp |
|
``` |
|
|
|
Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). |
|
``` |
|
cd llama.cpp && LLAMA_CURL=1 make |
|
``` |
|
|
|
Step 3: Run inference through the main binary. |
|
``` |
|
./llama-cli --hf-repo Triangle104/Llama-3.1-Tulu-3-8B-Q5_K_M-GGUF --hf-file llama-3.1-tulu-3-8b-q5_k_m.gguf -p "The meaning to life and the universe is" |
|
``` |
|
or |
|
``` |
|
./llama-server --hf-repo Triangle104/Llama-3.1-Tulu-3-8B-Q5_K_M-GGUF --hf-file llama-3.1-tulu-3-8b-q5_k_m.gguf -c 2048 |
|
``` |
|
|