The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 1 day ago • 44
view article Article Ethics and Society Newsletter #6: Building Better AI: The Importance of Data Quality 3 days ago • 14
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 3 days ago • 89
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate 14 days ago • 23
view article Article BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks 9 days ago • 28
C4AI Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated May 23 • 37
GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks Paper • 2406.12925 • Published 12 days ago • 17
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models Paper • 2405.13974 • Published May 22 • 7
Tulu V2.5 Suite Collection A suite of models trained using DPO and PPO across a wide variety (up to 14) of preference datasets. See https://arxiv.org/abs/2406.09279 for more! • 41 items • Updated 13 days ago • 8
view article Article Reports on the Hub: A First Look at Self-governance in Open Source AI Development By frimelle • 14 days ago • 6
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 29 items • Updated 20 days ago • 208
MagicPose4D: Crafting Articulated Models with Appearance and Motion Control Paper • 2405.14017 • Published May 22 • 2
Flash Diffusion Collection Collection of models distilled using the method proposed in Flash Diffusion paper • 7 items • Updated 8 days ago • 13
IrokoBench Collection a human-translated benchmark dataset for 16 African languages covering three tasks: NLI, MMLU and MGSM • 6 items • Updated 26 days ago • 14
view article Article Introducing NPC-Playground, a 3D playground to interact with LLM-powered NPCs 22 days ago • 12
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models Paper • 2308.11462 • Published Aug 20, 2023 • 2
In-Context Prompt Editing For Conditional Audio Generation Paper • 2311.00895 • Published Nov 1, 2023 • 8
view article Article CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models May 24 • 20
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance Paper • 2405.12979 • Published May 21 • 9
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention Paper • 2405.12981 • Published May 21 • 26
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control Paper • 2405.12970 • Published May 21 • 21
Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published May 20 • 25
🚀GGUF Collection Llama.cpp compatible models, can be used on CPUs and GPUs! • 666 items • Updated about 5 hours ago • 24
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published May 17 • 23
ZeroGPU Spaces Collection ZeroGPU Spaces made by the community • 17 items • Updated 20 days ago • 203
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated 29 minutes ago • 118
Granite Code Models Collection A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 18 items • Updated 27 days ago • 145
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 116
LLM-AD: Large Language Model based Audio Description System Paper • 2405.00983 • Published May 2 • 13
FLAME: Factuality-Aware Alignment for Large Language Models Paper • 2405.01525 • Published May 2 • 21
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment Paper • 2405.01481 • Published May 2 • 22
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 49
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 106
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 67
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29 • 29
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25 • 33
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 56