--- base_model: - Meta-Llama-3.1-8B-Instruct tags: - merge - mergekit license: llama3.1 language: - en - de --- # llama3-8b-spaetzle-v51 This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes. * [sparsh35/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/sparsh35/Meta-Llama-3.1-8B-Instruct) ## 🧩 Configuration ```yaml models: - model: cstr/llama3-8b-spaetzle-v34 # no parameters necessary for base model - model: sparsh35/Meta-Llama-3.1-8B-Instruct parameters: density: 0.65 weight: 0.5 merge_method: dare_ties base_model: cstr/llama3-8b-spaetzle-v34 parameters: int8_mask: true dtype: bfloat16 random_seed: 0 tokenizer_source: base ``` ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "cstr/llama3-8b-spaetzle-v51" messages = [{"role": "user", "content": "What is a large language model?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ```