Hugging Face OSS Metrics

AI & ML interests

None defined yet.

Recent Activity

open-source-metrics's activity

davanstrien 
posted an update 2 days ago
view post
Post
1271
Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c
fdaudens 
posted an update 3 days ago
view post
Post
1042
🔍 From instruction-following to creative storytelling, dive into 2024's most impactful AI datasets! These gems are shaping everything from scientific research to video understanding.

Check it out: huggingface/open-source-ai-year-in-review-2024
regisss 
posted an update 4 days ago
Xenova 
posted an update 4 days ago
freddyaboulton 
posted an update 4 days ago
fdaudens 
posted an update 4 days ago
view post
Post
1091
🤝 Want to share your AI models while protecting your work? Licenses are key!

Fascinating to see that nearly 60% of models on the Hub use Apache & MIT licenses.

Explore the viz here: huggingface/open-source-ai-year-in-review-2024
sayakpaul 
posted an update 4 days ago
view post
Post
1470
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
·
clem 
posted an update 5 days ago
view post
Post
1478
Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots 🤖🦾🦿

https://t.co/enkFXjWndJ
  • 1 reply
·
freddyaboulton 
posted an update 5 days ago
fdaudens 
posted an update 5 days ago
view post
Post
1238
Did a fun experiment: What are the main themes emerging from the 100+ Nieman Journalism Lab predictions for 2025?

I used natural language processing to cluster and map them — really helps spot patterns that weren't obvious when reading predictions one by one. So what will shape journalism next year? A lot of AI and US politics (surprise!), but there's also this horizontal axis that spans from industry strategies to deep reflections on how to talk to the public.

Click any dot to explore the original prediction. What themes surprise/interest you the most?

👉 fdaudens/nieman_lab_2025_predictions_visualization

P.s.: I discovered that Nieman Lab's content is under Creative Commons license!
lewtun 
posted an update 6 days ago
view post
Post
6331
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

📈 Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

🎄 Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

🧭 Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
·
celinah 
posted an update 6 days ago
view post
Post
553
🚀 We've just dropped a new release v0.27.0 of the 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 Python library!

This release includes:
- 💾 New torch model loading utilities in the serialization module — providing a standardized way to save and load torch models with built-in support for sharding and safe serialization.
- 📦 Tooling for something exciting — if you like single-file formats for models like GGUF, you'll love what we're cooking up 👀 More coming soon!
- 🛠️ Loads of quality-of-life improvements and bug fixes!

release notes and full details here 👇
Wauplin/huggingface_hub#10

$ pip install -U huggingface_hub
fdaudens 
posted an update 8 days ago
freddyaboulton 
posted an update 10 days ago
view post
Post
1787
Version 0.0.21 of gradio-pdf now properly loads chinese characters!
freddyaboulton 
posted an update 10 days ago
view post
Post
1501
Hello Llama 3.2! 🗣️🦙

Build a Siri-like coding assistant that responds to "Hello Llama" in 100 lines of python! All with Gradio, webRTC 😎

freddyaboulton/hey-llama-code-editor
fdaudens 
posted an update 10 days ago
freddyaboulton 
posted an update 12 days ago
sayakpaul 
posted an update 12 days ago
view post
Post
2019
Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences
·
sayakpaul 
posted an update 13 days ago
view post
Post
2087
The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130