Spaces:
Running
Running
[Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO
pinned
21
#15 opened 30 days ago
by
lewtun

[Paper review] Small Models Struggle to Learn from Strong Reasoners
#19 opened 14 days ago
by
lewtun

Seeking Clarification on GRPO's Core Mechanisms (for Independent Implementation)
#18 opened 20 days ago
by
bird-of-paradise
⚠️ Chat template foot gun with DeepSeek distilled models and RL format reward function
6
#17 opened 23 days ago
by
lewtun

[Experiment]Applying Open-R1-Math-220k to smolThinking models.
1
#16 opened 27 days ago
by
HarleyCooper
GoogleDeepMind Unstructured-To-JSON Model
#14 opened about 1 month ago
by
bhaviktheslider
DeepSeek Distilled 32B Responding in Multi Language on English Prompting
1
#13 opened about 1 month ago
by
bhaviktheslider
DeepSeek R1 Replication on Qwen 2.5 1.5B for Unstructured to Structured JSON Conversion
2
#12 opened about 1 month ago
by
bhaviktheslider
Generating Synthetic questions with "Reverse Question Answering"?
#11 opened about 1 month ago
by
georgebassemfouad
Multimodal R1
1
#10 opened about 1 month ago
by
salma-remyx

Replicated R1 Strategy on 8*H100 GPUs - For Qwen-2.5-1.5b
#9 opened about 1 month ago
by
bhaviktheslider
Unstructured Text to Structured Schema based on rules - Deepseek Distilled 7b Thinking responses
3
#6 opened about 1 month ago
by
bhaviktheslider
SmolLm2-135 R1 Distill
1
#5 opened about 1 month ago
by
ewre324
What is the compute needed for GRPO for 7B R1-Distill model?
#4 opened about 1 month ago
by
AndrewSanders
Reproducing Deepseek's numbers for MATH-500
#3 opened about 1 month ago
by
edbeeching

Recommend a dataset in the scientific domain made by us: EricLu/SCP-116K
3
#2 opened about 1 month ago
by
EricLu
LLM Benchmarks and Data Leakage
3
#1 opened about 1 month ago
by
dvamvour