Open LLM Leaderboard

Enterprise

community

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

open-llm-bot updated a dataset less than a minute ago

open-llm-leaderboard/suayptalha__DeepSeek-R1-Distill-Llama-3B-details

open-llm-bot updated a dataset 2 minutes ago

open-llm-leaderboard/results

open-llm-bot updated a dataset 3 minutes ago

open-llm-leaderboard/Nexesenex__Nemotron_W_4b_MagLight_0.1-details

View all activity

open-llm-leaderboard's activity

open-llm-bot

updated a dataset less than a minute ago

open-llm-leaderboard/suayptalha__DeepSeek-R1-Distill-Llama-3B-details

Updated less than a minute ago

open-llm-bot

updated a dataset 2 minutes ago

open-llm-leaderboard/results

Updated 2 minutes ago • 78.5k • 9

open-llm-bot

updated a dataset 3 minutes ago

open-llm-leaderboard/Nexesenex__Nemotron_W_4b_MagLight_0.1-details

Updated 3 minutes ago

open-llm-bot

updated a dataset 5 minutes ago

open-llm-leaderboard/JayHyeon__Qwen_0.5-VDPO_5e-7-1ep_3vpo_const-details

Updated 5 minutes ago

open-llm-bot

updated a dataset 6 minutes ago

open-llm-leaderboard/JayHyeon__Qwen_0.5-VDPO_5e-7-1ep_10vpo_const-details

Viewer • Updated 6 minutes ago • 41.7k

AdinaY

posted an update about 3 hours ago

Post

107

Try QwQ-Max-Preview, Qwen's reasoning model here👉 https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub 🔥

AdinaY

posted an update about 14 hours ago

Post

681

Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync 👇

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? 👀

AdinaY

posted an update 5 days ago

Post

690

VLM-R1🔥bringing DeepSeek’s R1 method to vision language models!

GitHub: https://github.com/om-ai-lab/VLM-R1
Demo: omlab/VLM-R1-Referral-Expression

AdinaY

posted an update 6 days ago

Post

4147

🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b

3 replies

AdinaY

posted an update 6 days ago

Post

2392

The latest paper of DeepSeek is now available on the Daily Papers page 🚀
You can reach out to the authors directly on this page👇
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)

1 reply

AdinaY

posted an update 11 days ago

Post

2539

Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction

AdinaY

posted an update 12 days ago

Post

3534

InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts

1 reply

lewtun

posted an update 14 days ago

Post

4510

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

hynky

authored a paper 18 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

lewtun

authored a paper 18 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

clefourrier

authored a paper 18 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

thomwolf

authored a paper 18 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

lvwerra

authored a paper 19 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 20 days ago • 191

AdinaY

posted an update 19 days ago

Post

3068

Xwen 🔥 a series of open models based on Qwen2.5 models, developed by a brilliant research team of PhD students from the Chinese community.
shenzhi-wang/xwen-chat-679e30ab1f4b90cfa7dbc49e
✨ 7B/72B
✨ Apache 2.0
✨ Xwen-72B-Chat outperformed DeepSeek V3 on Arena Hard Auto

victor

posted an update 20 days ago

Post

3968

Hey everyone, we've given https://hf.co/spaces page a fresh update!

Smart Search: Now just type what you want to do—like "make a viral meme" or "generate music"—and our search gets it.

New Categories: Check out the cool new filter bar with icons to help you pick a category fast.

Redesigned Space Cards: Reworked a bit to really show off the app descriptions, so you know what each Space does at a glance.

Random Prompt: Need ideas? Hit the dice button for a burst of inspiration.

We’d love to hear what you think—drop us some feedback plz!

5 replies

AI & ML interests

Recent Activity

Team members 18

open-llm-leaderboard's activity