No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 5 days ago • 35
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 3 days ago • 83
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 1 day ago • 67
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 8 days ago • 67
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper • 2412.13018 • Published 4 days ago • 39
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated 1 day ago • 68
Evaluating Language Models as Synthetic Data Generators Paper • 2412.03679 • Published 16 days ago • 43
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16 • 44
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 19 days ago • 194
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 11 items • Updated Apr 3 • 114
OLMoE Collection Artifacts for open mixture-of-experts language models. • 13 items • Updated 23 days ago • 27
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 52
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 75
Bad Data Toolbox Collection PleIAs collection of models for the data processing of challenging document and data sources. • 5 items • Updated Jul 18 • 15
view article Article Advanced RAG: Fine-Tune Embeddings from HuggingFace for RAG By lucifertrj • Jul 5 • 4