allenai
/

tulu-2-dpo-70b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

natolambert commited on Nov 13, 2023

Commit

bc6c48f

•

1 Parent(s): 9b9d8d5

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -20,7 +20,9 @@ base_model: meta-llama/Llama-2-70b-hf
 # Model Card for Tulu V2 DPO 70B
-Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) that was trained on on a mix of publicly available, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290). We found that removing the in-built alignment of these datasets boosted performance on [MT Bench](https://huggingface.co/spaces/lmsys/mt-bench) and made the model more helpful. However, this means that model is likely to generate problematic text when prompted to do so and should only be used for educational and research purposes. You can find more details in the [technical report](https://arxiv.org/abs/2310.16944).
 ## Model description
@@ -28,7 +30,7 @@ Zephyr is a series of language models that are trained to act as helpful assista
 - **Model type:** The flagship model of a suite of instruction and RLHF tuned chat models on a mix of publicly available, synthetic and human-created datasets.
 - **Language(s) (NLP):** Primarily English
 - **License:** MIT
-- **Finetuned from model:** [meta-llama/Llama-2-70b-hf](https://huggingface.co/ meta-llama/Llama-2-70b-hf)
 ### Model Sources

 # Model Card for Tulu V2 DPO 70B
+Tulu is a series of language models that are trained to act as helpful assistants.
+Tulu V2 DPO 70B, and is a fine-tuned version of Llama 2 that was trained on on a mix of publicly available, synthetic and human datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290).
+This model is a strong alternative to Llama 2 70b Chat.
 ## Model description
 - **Model type:** The flagship model of a suite of instruction and RLHF tuned chat models on a mix of publicly available, synthetic and human-created datasets.
 - **Language(s) (NLP):** Primarily English
 - **License:** MIT
+- **Finetuned from model:** [meta-llama/Llama-2-70b-hf](https://huggingface.co/meta-llama/Llama-2-70b-hf)
 ### Model Sources