File size: 11,228 Bytes
72d8e3c 648d500 4f77163 648d500 cfd1b5a 648d500 72d8e3c 945137b 72d8e3c 945137b 72d8e3c 945137b 72d8e3c b5a0f3f 72d8e3c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
---
base_model: DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1b
pipeline_tag: text-generation
inference: false
model_creator: DiscoResearch
model_name: Llama3-DiscoLeo-Instruct-8B-32k-v0.1b
model_type: llama3
language:
- de
library_name: transformers
license: llama3
quantized_by: ThiloteE
tags:
- text-generation-inference
- transformers
- GGUF
- GPT4All-community
- GPT4All
- conversational
- merge
---
> [!NOTE]
> This is a model that is assumed to perform well, but may require more testing and user feedback. Be aware, only models featured within the GUI of GPT4All, are curated and officially supported by Nomic. Use at your own risk.
# About
<!-- ### quantize_version: 3 -->
<!-- ### convert_type: hf -->
- Static quants of DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1b at commit [21214b3](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1/commit/21214b3dc9cec9485641a4cbd5ba75867a774605)
- Quantized by [ThiloteE](https://huggingface.co/ThiloteE) with llama.cpp commit [e09a800](https://github.com/ggerganov/llama.cpp/commit/e09a800f9a9b19c73aa78e03b4c4be8ed988f3e6)
# Prompt Template (for GPT4All)
Example System Prompt:
```Markdown
<|start_header_id|>system<|end_header_id|>
Die folgende Anweisung gibt einen Text vor und fordert Sie auf, eine angemessene Antwort zu formulieren. Bitte geben Sie Ihre Antwort.<|eot_id|>
```
Chat Template:
```Markdown
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
%1<|eot_id|><|begin_of_text|><|start_header_id|>assistant<|end_header_id|>
%2<|eot_id|>
```
# Context Length
`32768`
Use a lower value during inference, if you do not have enough RAM or VRAM.
# Provided Quants
| Link | Type | Size/GB | Notes |
|:-----|:-----|--------:|:------|
| [GGUF](https://huggingface.co/GPT4All-Community/Llama3-DiscoLeo-Instruct-8B-32k-v0.1/resolve/main/Llama3-DiscoLeo-Instruct-8B-32k-v0.1-Q4_0.gguf) | Q4_0 | 4.66 | fast, recommended |
# About GGUF
If you are unsure how to use GGUF files, refer to one of [TheBloke's
READMEs](https://huggingface.co/TheBloke/DiscoLM_German_7b_v1-GGUF) for
more details, including on how to concatenate multi-part files.
Here is a handy graph by ikawrakow comparing some quant types (lower is better):
![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)
And here are Artefact2's thoughts on the matter:
https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
# Thanks
I thank Mradermacher and TheBloke for Inspiration to this model card and their contributions to open source. Also 3Simplex for lots of help along the way.
Shoutout to the GPT4All and llama.cpp communities :-)
------
<!-- footer end -->
<!-- original-model-card start -->
------
------
# Original Model card:
---
license: llama3
language:
- de
library_name: transformers
---
# # Llama3-DiscoLeo-Instruct 8B 32k-context (version 0.1)
## Thanks and Accreditation
[DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.
## Model Overview
DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1 is an instruction tuned version of our long-context [Llama3-German-8B-32k](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k).
The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
For the long-context version we trained on an additional 100 million tokens at 32k context length, using a rope_theta value of 1.5e6 and a learning rate of 1.5e-5 with a batch size of 256*8192 and otherwise equal hyperparameters to the base model.
We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).
## How to use
Llama3_DiscoLeo_Instruct_8B_32k_v0.1 uses the [Llama-3 chat template](https://github.com/meta-llama/llama3?tab=readme-ov-file#instruction-tuned-models), which can be easily used with [transformer's chat templating](https://huggingface.co/docs/transformers/main/en/chat_templating).
See [below](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1#usage-example) for a usage example.
## Model Training and Hyperparameters
The model was full-fintuned with axolotl on the [hessian.Ai 42](hessian.ai) with 32,768 context-length, learning rate 2e-5 and batch size of 16.
## Evaluation and Results
We evaluated the model using a suite of common English Benchmarks and their German counterparts with [GermanBench](https://github.com/bjoernpl/GermanBenchmark).
In the below image and corresponding table, you can see the benchmark scores for the different instruct models compared to Metas instruct version. All checkpoints are available in this [collection](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729).
![instruct scores](instruct_model_benchmarks.png)
| Model | truthful_qa_de | truthfulqa_mc | arc_challenge | arc_challenge_de | hellaswag | hellaswag_de | MMLU | MMLU-DE | mean |
|----------------------------------------------------|----------------|---------------|---------------|------------------|-------------|--------------|-------------|-------------|-------------|
| meta-llama/Meta-Llama-3-8B-Instruct | 0.47498 | 0.43923 | **0.59642** | 0.47952 | **0.82025** | 0.60008 | **0.66658** | 0.53541 | 0.57656 |
| DiscoResearch/Llama3-German-8B | 0.49499 | 0.44838 | 0.55802 | 0.49829 | 0.79924 | 0.65395 | 0.62240 | 0.54413 | 0.57743 |
| DiscoResearch/Llama3-German-8B-32k | 0.48920 | 0.45138 | 0.54437 | 0.49232 | 0.79078 | 0.64310 | 0.58774 | 0.47971 | 0.55982 |
| DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 | **0.53042** | 0.52867 | 0.59556 | **0.53839** | 0.80721 | 0.66440 | 0.61898 | 0.56053 | **0.60552** |
| **DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1** | 0.52749 | **0.53245** | 0.58788 | 0.53754 | 0.80770 | **0.66709** | 0.62123 | **0.56238** | 0.60547 |
## Model Configurations
We release DiscoLeo-8B in the following configurations:
1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3-German_8B)
2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1)
4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1) (This model)
5. [Experimental `DARE-TIES` Merge with Llama3-Instruct](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_8B_DARE_Experimental)
6. [Collection of Quantized versions](https://huggingface.co/collections/DiscoResearch/discoleo-8b-quants-6651bcf8f72c9a37ce485d42)
## Usage Example
Here's how to use the model with transformers:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1")
prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
messages = [
{"role": "system", "content": "Du bist ein hilfreicher Assistent."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## Acknowledgements
The model was trained and evaluated by [Björn Plüster](https://huggingface.co/bjoernp) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)) with data preparation and project supervision by [Manuel Brack](http://manuel-brack.eu) ([DFKI](https://www.dfki.de/web/), [TU-Darmstadt](https://www.tu-darmstadt.de/)). Initial work on dataset collection and curation was performed by [Malte Ostendorff](https://ostendorff.org) and [Pedro Ortiz Suarez](https://portizs.eu). Instruction tuning was done with the DiscoLM German dataset created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)). We extend our gratitude to [LAION](https://laion.ai/) and friends, especially [Christoph Schuhmann](https://entwickler.de/experten/christoph-schuhmann) and [Jenia Jitsev](https://huggingface.co/JJitsev), for initiating this collaboration.
The model training was supported by a compute grant at the [42 supercomputer](https://hessian.ai/) which is a central component in the development of [hessian AI](https://hessian.ai/), the [AI Innovation Lab](https://hessian.ai/infrastructure/ai-innovationlab/) (funded by the [Hessian Ministry of Higher Education, Research and the Art (HMWK)](https://wissenschaft.hessen.de) & the [Hessian Ministry of the Interior, for Security and Homeland Security (HMinD)](https://innen.hessen.de)) and the [AI Service Centers](https://hessian.ai/infrastructure/ai-service-centre/) (funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)).
The curation of the training data is partially funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)
through the project [OpenGPT-X](https://opengpt-x.de/en/) (project no. 68GX21007D).
<!-- original-model-card end -->
<!-- end -->
|