DiscoResearch/Llama3-German-8B
Text Generation
•
Updated
•
873
•
36
Continued Pretraining on Llama3 8B to improve German linguistic capabilities. A collection of base and fine-tuned models and variants.
Note Pretrained model with continued pretraining on 65b tokens of high-quality German texts.
Note Same as above but with an additional 100m tokens of pretraining on texts of 32k tokens length and rope_theta=1.5e6 to improve long-context capabilities.
Note DiscoResearch/Llama3_German_8B finetuned on our DiscoLM German Instruction dataset.
Note DiscoResearch/Llama3_German_8B_32k finetuned on our DiscoLM German Instruction dataset.
Note Experimental Merge of meta-llama/Meta-Llama-3-8B and DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1 with DARE-TIES and 0.5:0.5 ratio.