48 71 87

Elie Bakouch

eliebak

AI & ML interests

Training LLM's @ 🤗

Recent Activity

new activity about 1 hour ago

nanotron/ultrascale-playbook:Fix typos (merged with #95)

new activity about 3 hours ago

nanotron/ultrascale-playbook:fix-typos

new activity about 3 hours ago

nanotron/ultrascale-playbook:fix-typos

View all activity

Organizations

eliebak's activity

upvoted an article 2 days ago

Article

DualPipe could be better without the Dual

•

3 days ago

• 13

upvoted a collection 4 days ago

🧩 SmolLM2 Intermdiate Checkpoints

Collection

3 items • Updated 4 days ago • 2

upvoted a paper 7 days ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published 14 days ago • 28

upvoted a collection 7 days ago

PLLuM-chat

Collection

6 items • Updated 7 days ago • 6

upvoted an article 9 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

12 days ago

• 186

upvoted a collection 9 days ago

Granite Data

Collection

This collection has a set of artifacts which are related to curating and evaluating datasets used for Granite models • 16 items • Updated 3 days ago • 4

upvoted an article 13 days ago

Article

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

14 days ago

• 90

upvoted a paper 17 days ago

INTELLECT-1 Technical Report

Paper • 2412.01152 • Published Dec 2, 2024 • 1

upvoted an article 20 days ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

and 1 other •

20 days ago

• 25

upvoted an article 21 days ago

Article

Open R1: Update #2

and 6 others •

21 days ago

• 196

upvoted a paper 22 days ago

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published 27 days ago • 18

upvoted a paper 25 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 27 days ago • 196

upvoted a paper 27 days ago

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training

Paper • 2501.18965 • Published Jan 31 • 7

upvoted 2 articles 27 days ago

Article

Open-source DeepResearch – Freeing our search agents

28 days ago

• 1.12k

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

28 days ago

• 51

upvoted an article 29 days ago

Article

Open-R1: Update #1

and 7 others •

30 days ago

• 291

upvoted 3 articles about 1 month ago

Article

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

•

Jan 31

• 41

Article

Mastering Long Contexts in LLMs with KVPress

and 1 other •

Jan 23

• 64

Article

How biased is Whisper ? Evaluating Whisper Models for Robustness to Diverse English Accents

•

Jan 29

• 16

upvoted a paper about 1 month ago

Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts

Paper • 2501.14334 • Published Jan 24 • 20