Hugging Face Internal Testing Organization

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

hf-internal-testing's activity

anton-lย 
posted an update 3 days ago
view post
Post
1804
Introducing ๐Ÿ“๐…๐ข๐ง๐ž๐Œ๐š๐ญ๐ก: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
๐Ÿ› ๏ธ carefully extracting math data from Common Crawl;
๐Ÿ”Ž iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! ๐Ÿš€
Weโ€™re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
Xenovaย 
posted an update 3 days ago
view post
Post
1479
Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
๐Ÿš€ Faster and more accurate than Whisper
๐Ÿ”’ Privacy-focused (no data leaves your device)
โšก๏ธ WebGPU accelerated (w/ WASM fallback)
๐Ÿ”ฅ Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web
sayakpaulย 
posted an update 4 days ago
view post
Post
1463
In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.
  • 1 reply
ยท
lewtunย 
posted an update 6 days ago
view post
Post
6320
We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute ๐Ÿ”ฅ

How? By combining step-wise reward models with tree search algorithms :)

We show that smol models can match or exceed the performance of their much larger siblings when given enough "time to think"

We're open sourcing the full recipe and sharing a detailed blog post.

In our blog post we cover:

๐Ÿ“ˆ Compute-optimal scaling: How we implemented DeepMind's recipe to boost the mathematical capabilities of open models at test-time.

๐ŸŽ„ Diverse Verifier Tree Search (DVTS): An unpublished extension we developed to the verifier-guided tree search technique. This simple yet effective method improves diversity and delivers better performance, particularly at large test-time compute budgets.

๐Ÿงญ Search and Learn: A lightweight toolkit for implementing search strategies with LLMs and built for speed with vLLM

Here's the links:

- Blog post: HuggingFaceH4/blogpost-scaling-test-time-compute

- Code: https://github.com/huggingface/search-and-learn

Enjoy!
  • 2 replies
ยท
lhoestqย 
posted an update 10 days ago
view post
Post
1591
Made a HF Dataset editor a la gg sheets here: lhoestq/dataset-spreadsheets

With Dataset Spreadsheets:
โœ๏ธ Edit datasets in the UI
๐Ÿ”— Share link with collaborators
๐Ÿ Use locally in DuckDB or Python

Available for the 100,000+ parquet datasets on HF :)
Narsilย 
posted an update 10 days ago
view post
Post
900
Performance leap: TGI v3 is out. Processes 3x more tokens, 13x faster than vLLM on long prompts. Zero config !



3x more tokens.

By reducing our memory footprint, weโ€™re able to ingest many more tokens and more dynamically than before. A single L4 (24GB) can handle 30k tokens on llama 3.1-8B, while vLLM gets barely 10k. A lot of work went into reducing the footprint of the runtime and its effect are best seen on smaller constrained environments.
13x faster

On long prompts (200k+ tokens) conversation replies take 27.5s in vLLM, while it takes only 2s in TGI. How so ? We keep the initial conversation around, so when a new reply comes in, we can answer almost instantly. The overhead of the lookup is ~5us. Thanks @Dani รซl de Kok for the beast data structure.
Zero config

Thatโ€™s it. Remove all the flags your are using and youโ€™re likely to get the best performance. By evaluating the hardware and model, TGI carefully selects automatic values to give best performance. In production, we donโ€™t have any flags anymore in our deployments. We kept all existing flags around, they may come in handy in niche scenarios.

Read more: https://huggingface.co/docs/text-generation-inference/conceptual/chunking
sayakpaulย 
posted an update 12 days ago
view post
Post
2017
Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences
ยท
sayakpaulย 
posted an update 13 days ago
view post
Post
2087
The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script ๐Ÿค—

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130
Xenovaย 
posted an update 14 days ago
view post
Post
2268
Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! ๐Ÿ”ฅ High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. ๐Ÿค— Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)
sayakpaulย 
posted an update 23 days ago
Xenovaย 
posted an update 24 days ago
view post
Post
3907
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! ๐Ÿคฏ Let's take a look:
๐Ÿ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
๐Ÿ‘๏ธ Qwen2-VL from Qwen for dynamic-resolution image understanding
๐Ÿ”ข JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
๐ŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
๐Ÿคธโ€โ™€๏ธ ViTPose for pose estimation
๐Ÿ“„ MGP-STR for optical character recognition (OCR)
๐Ÿ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! ๐Ÿ”ฅ Huge for privacy!

Check out the release notes for more information. ๐Ÿ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
albertvillanovaย 
posted an update about 1 month ago
view post
Post
1361
๐Ÿšจ How green is your model? ๐ŸŒฑ Introducing a new feature in the Comparator tool: Environmental Impact for responsible #LLM research!
๐Ÿ‘‰ open-llm-leaderboard/comparator
Now, you can not only compare models by performance, but also by their environmental footprint!

๐ŸŒ The Comparator calculates COโ‚‚ emissions during evaluation and shows key model characteristics: evaluation score, number of parameters, architecture, precision, type... ๐Ÿ› ๏ธ
Make informed decisions about your model's impact on the planet and join the movement towards greener AI!
Xenovaย 
posted an update about 1 month ago
view post
Post
5502
Have you tried out ๐Ÿค— Transformers.js v3? Here are the new features:
โšก WebGPU support (up to 100x faster than WASM)
๐Ÿ”ข New quantization formats (dtypes)
๐Ÿ› 120 supported architectures in total
๐Ÿ“‚ 25 new example projects and templates
๐Ÿค– Over 1200 pre-converted models
๐ŸŒ Node.js (ESM + CJS), Deno, and Bun compatibility
๐Ÿก A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3
  • 3 replies
ยท
ArthurZย 
posted an update about 1 month ago
view post
Post
2632
Native tensor parallel has landed in transformers!!! https://github.com/huggingface/transformers/pull/34184 thanks a lot to the torch team for their support!

Contributions are welcome to support more models! ๐Ÿ”ฅ
sayakpaulย 
posted an update about 1 month ago
view post
Post
2591
It's been a while we shipped native quantization support in diffusers ๐Ÿงจ

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
  • 1 reply
ยท
albertvillanovaย 
posted an update about 2 months ago
view post
Post
1454
๐Ÿš€ New feature of the Comparator of the ๐Ÿค— Open LLM Leaderboard: now compare models with their base versions & derivatives (finetunes, adapters, etc.). Perfect for tracking how adjustments affect performance & seeing innovations in action. Dive deeper into the leaderboard!

๐Ÿ› ๏ธ Here's how to use it:
1. Select your model from the leaderboard.
2. Load its model tree.
3. Choose any base & derived models (adapters, finetunes, merges, quantizations) for comparison.
4. Press Load.
See side-by-side performance metrics instantly!

Ready to dive in? ๐Ÿ† Try the ๐Ÿค— Open LLM Leaderboard Comparator now! See how models stack up against their base versions and derivatives to understand fine-tuning and other adjustments. Easier model analysis for better insights! Check it out here: open-llm-leaderboard/comparator ๐ŸŒ