NeuralMonarch-7B / README.md
mlabonne's picture
Update README.md
1003f1f verified
metadata
license: cc-by-nc-4.0
tags:
  - merge
  - lazymergekit
dataset:
  - mlabonne/truthy-dpo-v0.1
  - mlabonne/distilabel-intel-orca-dpo-pairs
base_model:
  - mlabonne/Monarch-7B
language:
  - en

image/jpeg

πŸ‘‘ NeuralMonarch-7B

NeuralMonarch-7B is a DPO fine-tuned of mlabonne/Monarch-7B using the jondurbin/truthy-dpo-v0.1 and argilla/distilabel-intel-orca-dpo-pairs preference datasets.

It is based on a merge of the following models using LazyMergekit:

Special thanks to Jon Durbin, Intel, and Argilla for the preference datasets.

πŸ” Applications

This model uses a context window of 8k. It is compatible with different templates, like chatml, Llama's and Mistral's chat templates.

Compared to other 7B models, it displays good performance in instruction following and reasoning tasks. It can also be used for RP and storytelling.

⚑ Quantized models

  • GGUF: TBD

πŸ† Evaluation

The evaluation was performed using LLM AutoEval on Nous suite. See the entire leaderboard here.

Model Average AGIEval GPT4All TruthfulQA Bigbench
Monarch-7B πŸ“„ 62.68 45.48 77.07 78.04 50.14
teknium/OpenHermes-2.5-Mistral-7B πŸ“„ 52.42 42.75 72.99 52.99 40.94
mlabonne/NeuralHermes-2.5-Mistral-7B πŸ“„ 53.51 43.67 73.24 55.37 41.76
mlabonne/NeuralBeagle14-7B πŸ“„ 60.25 46.06 76.77 70.32 47.86
eren23/dpo-binarized-NeuralTrix-7B πŸ“„ 62.5 44.57 76.34 79.81 49.27
CultriX/NeuralTrix-7B-dpo πŸ“„ 62.5 44.61 76.33 79.8 49.24

πŸ’» Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/NeuralMonarch-7B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])