--- tags: - merge - mergekit - nbeerbower/llama-3-wissenschaft-8B-v2 license: llama3 language: - en - de --- # llama3-8b-spaetzle-v20 llama3-8b-spaetzle-v20 is an int4-inc (intel auto-round) quantized merge of the following models: * [cstr/llama3-8b-spaetzle-v13](https://huggingface.co/cstr/llama3-8b-spaetzle-v13) * [Azure99/blossom-v5-llama3-8b](https://huggingface.co/Azure99/blossom-v5-llama3-8b) * [VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct) * [nbeerbower/llama-3-wissenschaft-8B-v2](https://huggingface.co/nbeerbower/llama-3-wissenschaft-8B-v2) ## Benchmarks The GGUF q4_k_m version achieves on EQ-Bench v2_de 65.7 (171/171 parseable). From [Intel's low bit open llm leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard): | Type | Model | Average ⬆️ | ARC-c | ARC-e | Boolq | HellaSwag | Lambada | MMLU | Openbookqa | Piqa | Truthfulqa | Winogrande | #Params (B) | #Size (G) | |------|-------------------------------------------|------------|-------|-------|-------|-----------|---------|-------|------------|-------|------------|------------|-------------|-----------| | 🍒 | **cstr/llama3-8b-spaetzle-v20-int4-inc** | **66.43** | **61.77** | **85.4** | **82.75** | **62.79** | **71.73** | **64.17** | **37.4** | **80.41** | **43.21** | **74.66** | **7.04** | **5.74** | ## 🧩 Configuration ```yaml models: - model: cstr/llama3-8b-spaetzle-v13 # no parameters necessary for base model - model: nbeerbower/llama-3-wissenschaft-8B-v2 parameters: density: 0.65 weight: 0.4 merge_method: dare_ties base_model: cstr/llama3-8b-spaetzle-v13 parameters: int8_mask: true dtype: bfloat16 random_seed: 0 tokenizer_source: base ``` ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "cstr/llama3-8b-spaetzle-v20" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ```