File size: 11,967 Bytes
c68aaa3
 
 
 
 
 
 
 
 
e0a324a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8ecb6c4
 
 
 
e0a324a
 
 
 
8ecb6c4
 
 
 
e0a324a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0203942
e0a324a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fddc548
e0a324a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
---
title: README
emoji: 🐠
colorFrom: yellow
colorTo: yellow
sdk: static
pinned: false
---

# Hugging Face Research

The science team at Hugging Face is dedicated to advancing machine learning research in ways that maximize value for the whole community. Our work focuses on three core areas of tooling, datasets and open models:

### πŸ› οΈ Tooling & Infrastructure

The foundation of ML research is tooling and infrastructure and we are working on a range of tools such as [datatrove](www.github.com/huggingface/datatrove), [nanotron](www.github.com/huggingface/nanotron), [TRL](www.github.com/huggingface/trl), [LeRobot](www.github.com/huggingface/lerobot), and [lighteval](www.github.com/huggingface/lighteval).

### πŸ“‘ Datasets

High quality datasets are the powerhouse of LLMs and require special care and skills to build. We focus on building high-quality datasets such as [no-robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots), [FineWeb](https://huggingface.co/datasets/HuggingFaceFW/fineweb), [The Stack](https://huggingface.co/datasets/bigcode/the-stack-v2), and [FineVideo](https://huggingface.co/datasets/HuggingFaceFV/finevideo).

### πŸ€– Open Models

The datatsets and training recipes of most state-of-the-art models are not released. We build cutting-edge models and release the full training pipeline as well fostering more innovation and reproducibility, such as [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta), [StarCoder2](https://huggingface.co/bigcode/starcoder2-15b), or [SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct).

### 🌸 Collaborations

Research and collaboration go hand in hand. That's why we like to organize and participate in large open collaborations such as [BigScience](https://bigscience.huggingface.co) and [BigCode](https://www.bigcode-project.org). 

### βš™οΈ Infrastructre

The research team is organized in small teams with typically <4 people and the science cluster consists of 96 x 8xH100 nodes as well as an auto-scalable CPU cluster for dataset processing. In this setup, even a small research team can build and push out impactful artifacts.

### πŸ“– Educational material

Besides writing tech reports of research projects we also like to write more educational content to help newcomers get started to the field or practitioners. We built for example the [alignment handbook](https://github.com/huggingface/alignment-handbook), the [pretraining tutorial](https://www.youtube.com/watch?v=2-SPH9hIKT8), or the [FineWeb blog](https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1).

### πŸš€ Releases

This is the release timeline of 2024 so far (you can click on each element!):

<style>
@import url('https://fonts.googleapis.com/css2?family=DM+Sans:wght@400;600&display=swap');

.timeline-container {
    max-width: 2000px;
    margin: 140px auto 40px;
    padding: 20px;
    overflow-x: scroll;
    font-family: 'DM Sans', sans-serif;
}

.timeline-container::-webkit-scrollbar {
    display: none;  /* Chrome, Safari, Edge */
}

.timeline {
    position: relative;
    min-width: 1800px; /* Minimum width before scrolling */
}

.line {
    position: absolute;
    width: 100%;
    height: 2px;
    background: #333;
    bottom: 30px;
}

.timeline-items {
    display: flex;
    justify-content: space-between;
    align-items: flex-end;
    position: relative;
    min-height: 200px;
}

.month-marker {
    display: flex;
    flex-direction: column;
    align-items: center;
    position: relative;
    flex: 1;
}

.month-dot {
    width: 12px;
    height: 12px;
    background: #000;
    border-radius: 50%;
    margin-bottom: -7px;
    position: relative;
    z-index: 1;
}

.month-label {
    font-weight: bold;
    margin-top: 10px;
}

.events-container {
    position: absolute;
    bottom: 40px;
    left: 50%;
    transform: translateX(-50%);
    width: 200px;
    text-align: left;
}

.event {
    position: relative;
    left: 92px;
    margin: 8px 0;
    font-size: 14px;
    display: flex;
    align-items: flex-start;
    gap: 5px;
    white-space: pre-wrap;
    max-width: 150px;
}

.event img {
    width: 20px;
    height: 20px;
    vertical-align: middle;
}

.event a {
    /*color: #000000;*/
    text-decoration: none;
}

.event a:hover {
    color: #686868;
    text-decoration: underline;
}
</style>

<div class="timeline-container">
    <div class="timeline">
        <div class="line"></div>
        <div class="timeline-items">
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">πŸ”₯ Warming up</div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Jan</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">βš™οΈ<a href="https://github.com/huggingface/nanotron/" target="_blank">Nanotron Release</a></div>
                    <div class="event">⭐️<a href="https://huggingface.co/datasets/bigcode/the-stack-v2" target="_blank">The Stack v2</a></div>
                    <div class="event">⭐️<a href="https://huggingface.co/bigcode/starcoder2-15b" target="_blank">StarCoder2</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Feb</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">πŸͺ<a href="https://huggingface.co/HuggingFaceH4/zephyr-7b-gemma-v0.1" target="_blank">Zephyr Gemma</a></div>
                    <div class="event">πŸͺ<a href="https://huggingface.co/datasets/HuggingFaceTB/cosmopedia" target="_blank">Cosmopedia</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Mar</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">🍷<a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb" target="_blank">FineWeb</a></div>
                    <div class="event">πŸ•΅οΈ<a href="https://huggingface.co/blog/jat" target="_blank">JAT Agent</a></div>
                    <div class="event">πŸͺ<a href="HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1" target="_blank">Zephyr Mixtral</a></div>
                    <div class="event">🐢<a href="https://huggingface.co/HuggingFaceM4/idefics2-8bb" target="_blank">Idefics 2</a></div>
                    <div class="event">πŸ†<a href="https://huggingface.co/blog/leaderboard-medicalllm" target="_blank">Community Leaderboards</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Apr</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">🦾<a href="https://github.com/huggingface/lerobot" target="_blank">LeRobot Release</a></div>
                    <div class="event">πŸ“ˆ<a href="https://arxiv.org/abs/2405.18392" target="_blank">WSD Analysis</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">May</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">🍷<a href="https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1f" target="_blank">FineWeb Report</a></div>
                    <div class="event">🍷<a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu" target="_blank">FineWeb-Edu</a></div>
                    <div class="event">🌺<a href="https://huggingface.co/blog/finetune-florence2" target="_blank">Florence 2 Blog</a></div>
                    <div class="event">πŸ†<a href="https://huggingface.co/spaces/open-llm-leaderboard/blog" target="_blank">Open LLM Leaderboard v2</a></div>
                    <div class="event">πŸ‘©β€πŸ«<a href="https://www.youtube.com/watch?v=jm2hyJLFfN8" target="_blank">Stanford CS25</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Jun</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">πŸ›Ÿ<a href="https://x.com/Haojun_Zhao14/status/1815419356408336738" target="_blank">Ring attention</a></div>
                    <div class="event">🦾<a href="https://x.com/RemiCadene/status/1805583409382932620" target="_blank">LeRobot TeleOps</a></div>
                    <div class="event">πŸ₯‡<a href="https://x.com/_lewtun/status/1808898804822720769" target="_blank">Win AIMO</a></div>
                    <div class="event">🐢<a href="https://huggingface.co/blog/docmatix" target="_blank">Docmatix</a></div>
                    <div class="event">🀏<a href="https://huggingface.co/blog/smollm" target="_blank">SmolLM</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Jul</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">🦾<a href="https://github.com/huggingface/lerobot/blob/main/examples/7_get_started_with_real_robot.md" target="_blank">LeRobot Tutorial</a></div>
                    <div class="event">πŸ“£<a href="https://github.com/huggingface/speech-to-speech" target="_blank">Speech-to-Speech</a></div>
                    <div class="event">🐢<a href="https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3" target="_blank">Idefics 3</a></div>
                    <div class="event">🀏<a href="https://huggingface.co/spaces/HuggingFaceTB/instant-smollm" target="_blank">Instant SmolLM</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Aug</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">🦾<a href="https://x.com/alibert_s/status/1828760527730082024" target="_blank">LeRobot Video</a></div>
                    <div class="event">πŸŽ₯<a href="https://huggingface.co/spaces/HuggingFaceFV/FineVideo-Explorer" target="_blank">FineVideo</a></div>
                    <div class="event">πŸ“£<a href="https://x.com/andi_marafioti/status/1830862304906268725" target="_blank">Speech-to-Speech Multilingual</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Sep</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">πŸ”Ž<a href="https://github.com/huggingface/evaluation-guidebook" target="_blank">LLM Evaluation Guidebook</a></div>
                    <div class="event">🦾<a href="https://x.com/Thom_Wolf/status/1851557379294286176" target="_blank">LeRobot Hackathon</a></div>
                    <div class="event">πŸ—ΊοΈ<a href="https://huggingface.co/spaces/HuggingFaceFW/blogpost-fine-tasks" target="_blank">FineTasks</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Oct</div>
            </div>
            <div class="month-marker">
                <div class="events-container">
                    <div class="event">🀏<a href="https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct" target="_blank">SmolLM2</a></div>
                </div>
                <div class="month-dot"></div>
                <div class="month-label">Nov</div>
            </div>
        </div>
    </div>
</div>

### πŸ€— Join us! 

We are actively hiring for both full-time and internships. Check out [hf.co/jobs](https://hf.co/jobs)