ThiloteE's picture
Update README.md
945137b verified
metadata
base_model: DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1b
pipeline_tag: text-generation
inference: false
model_creator: DiscoResearch
model_name: Llama3-DiscoLeo-Instruct-8B-32k-v0.1b
model_type: llama3
language:
  - de
library_name: transformers
license: llama3
quantized_by: ThiloteE
tags:
  - text-generation-inference
  - transformers
  - GGUF
  - GPT4All-community
  - GPT4All
  - conversational
  - merge

This is a model that is assumed to perform well, but may require more testing and user feedback. Be aware, only models featured within the GUI of GPT4All, are curated and officially supported by Nomic. Use at your own risk.

About

  • Static quants of DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1b at commit 21214b3
  • Quantized by ThiloteE with llama.cpp commit e09a800

Prompt Template (for GPT4All)

Example System Prompt:

<|start_header_id|>system<|end_header_id|>

Die folgende Anweisung gibt einen Text vor und fordert Sie auf, eine angemessene Antwort zu formulieren. Bitte geben Sie Ihre Antwort.<|eot_id|>

Chat Template:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

%1<|eot_id|><|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

%2<|eot_id|>

Context Length

32768

Use a lower value during inference, if you do not have enough RAM or VRAM.

Provided Quants

Link Type Size/GB Notes
GGUF Q4_0 4.66 fast, recommended

About GGUF

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

Here is a handy graph by ikawrakow comparing some quant types (lower is better):

image.png

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

Thanks

I thank Mradermacher and TheBloke for Inspiration to this model card and their contributions to open source. Also 3Simplex for lots of help along the way. Shoutout to the GPT4All and llama.cpp communities :-)




Original Model card:


license: llama3 language: - de library_name: transformers

# Llama3-DiscoLeo-Instruct 8B 32k-context (version 0.1)

Thanks and Accreditation

DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 is the result of a joint effort between DiscoResearch and Occiglot with support from the DFKI (German Research Center for Artificial Intelligence) and hessian.Ai. Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest dataset release, as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.

Model Overview

DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context Llama3-German-8B-32k. The base model was derived from Meta's Llama3-8B through continuous pretraining on 65 billion high-quality German tokens, similar to previous LeoLM or Occiglot models. For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model. We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by Jan-Philipp Harries and Daniel Auras (DiscoResearch, ellamind).

How to use

Llama3_DiscoLeo_Instruct_8B_32k_v0.1 uses the Llama-3 chat template, which can be easily used with transformer's chat templating. See below for a usage example.

Model Training and Hyperparameters

The model was full-fintuned with axolotl on the hessian.Ai 42 with 32,768 context-length, learning rate 2e-5 and batch size of 16.

Evaluation and Results

We evaluated the model using a suite of common English Benchmarks and their German counterparts with GermanBench.

In the below image and corresponding table, you can see the benchmark scores for the different instruct models compared to Metas instruct version. All checkpoints are available in this collection.

instruct scores

Model truthful_qa_de truthfulqa_mc arc_challenge arc_challenge_de hellaswag hellaswag_de MMLU MMLU-DE mean
meta-llama/Meta-Llama-3-8B-Instruct 0.47498 0.43923 0.59642 0.47952 0.82025 0.60008 0.66658 0.53541 0.57656
DiscoResearch/Llama3-German-8B 0.49499 0.44838 0.55802 0.49829 0.79924 0.65395 0.62240 0.54413 0.57743
DiscoResearch/Llama3-German-8B-32k 0.48920 0.45138 0.54437 0.49232 0.79078 0.64310 0.58774 0.47971 0.55982
DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 0.53042 0.52867 0.59556 0.53839 0.80721 0.66440 0.61898 0.56053 0.60552
DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1 0.52749 0.53245 0.58788 0.53754 0.80770 0.66709 0.62123 0.56238 0.60547

Model Configurations

We release DiscoLeo-8B in the following configurations:

  1. Base model with continued pretraining
  2. Long-context version (32k context length)
  3. Instruction-tuned version of the base model
  4. Instruction-tuned version of the long-context model (This model)
  5. Experimental DARE-TIES Merge with Llama3-Instruct
  6. Collection of Quantized versions

Usage Example

Here's how to use the model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device="cuda"

model = AutoModelForCausalLM.from_pretrained(
    "DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1")

prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
messages = [
    {"role": "system", "content": "Du bist ein hilfreicher Assistent."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Acknowledgements

The model was trained and evaluated by Björn Plüster (DiscoResearch, ellamind) with data preparation and project supervision by Manuel Brack (DFKI, TU-Darmstadt). Initial work on dataset collection and curation was performed by Malte Ostendorff and Pedro Ortiz Suarez. Instruction tuning was done with the DiscoLM German dataset created by Jan-Philipp Harries and Daniel Auras (DiscoResearch, ellamind). We extend our gratitude to LAION and friends, especially Christoph Schuhmann and Jenia Jitsev, for initiating this collaboration.

The model training was supported by a compute grant at the 42 supercomputer which is a central component in the development of hessian AI, the AI Innovation Lab (funded by the Hessian Ministry of Higher Education, Research and the Art (HMWK) & the Hessian Ministry of the Interior, for Security and Homeland Security (HMinD)) and the AI Service Centers (funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWK)). The curation of the training data is partially funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) through the project OpenGPT-X (project no. 68GX21007D).