view article Article SauerkrautLM's Multi-Phase Spectrum Training: A Technical Deep Dive By DavidGF β’ 13 days ago β’ 9
π«π· Calme-3 Collection Here you can find all the new Calme-3 models β’ 26 items β’ Updated 3 days ago β’ 7
VAGO solutions quants Collection Quantized version for the excellent german speaking models created by VAGO solutions. β’ 6 items β’ Updated Apr 20 β’ 2
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. β’ 39 items β’ Updated Sep 18 β’ 347
π Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets β’ 8 items β’ Updated Jun 12 β’ 31
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram β’ Apr 24 β’ 59
Meta Llama 3 Collection This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases β’ 5 items β’ Updated Sep 25 β’ 683
π©πͺGerman SFT and DPO datasets Collection Datasets that can be used for LLM training with axolotl, trl or llama_factory. β’ 32 items β’ Updated 11 days ago β’ 11
Arcee's MergeKit: A Toolkit for Merging Large Language Models Paper β’ 2403.13257 β’ Published Mar 20 β’ 20
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper β’ 2402.17764 β’ Published Feb 27 β’ 603
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper β’ 2401.17377 β’ Published Jan 30 β’ 34
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling Paper β’ 2312.15166 β’ Published Dec 23, 2023 β’ 56