rmbg (rmbg)

Xenova

posted an update about 1 month ago

Post

9030

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

9 replies

·

Xenova

authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 201

Xenova

posted an update about 2 months ago

Post

6373

Introducing Kokoro.js, a new JavaScript library for running Kokoro TTS, an 82 million parameter text-to-speech model, 100% locally in the browser w/ WASM. Powered by 🤗 Transformers.js. WebGPU support coming soon!
👉 npm i kokoro-js 👈

Try it out yourself: webml-community/kokoro-web
Link to models/samples: onnx-community/Kokoro-82M-ONNX

You can get started in just a few lines of code!

import { KokoroTTS } from "kokoro-js";

const tts = await KokoroTTS.from_pretrained(
  "onnx-community/Kokoro-82M-ONNX",
  { dtype: "q8" }, // fp32, fp16, q8, q4, q4f16
);

const text = "Life is like a box of chocolates. You never know what you're gonna get.";
const audio = await tts.generate(text,
  { voice: "af_sky" }, // See `tts.list_voices()`
);
audio.save("audio.wav");

Huge kudos to the Kokoro TTS community, especially taylorchu for the ONNX exports and Hexgrad for the amazing project! None of this would be possible without you all! 🤗

The model is also extremely resilient to quantization. The smallest variant is only 86 MB in size (down from the original 326 MB), with no noticeable difference in audio quality! 🤯

10 replies

·

Xenova

posted an update 2 months ago

Post

8347

First project of 2025: Vision Transformer Explorer

I built a web app to interactively explore the self-attention maps produced by ViTs. This explains what the model is focusing on when making predictions, and provides insights into its inner workings! 🤯

Try it out yourself! 👇
webml-community/attention-visualization

Source code: https://github.com/huggingface/transformers.js-examples/tree/main/attention-visualization

Xenova

posted an update 3 months ago

Post

4239

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser!
🚀 Faster and more accurate than Whisper
🔒 Privacy-focused (no data leaves your device)
⚡️ WebGPU accelerated (w/ WASM fallback)
🔥 Powered by ONNX Runtime Web and Transformers.js

Demo: webml-community/moonshine-web
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web

5 replies

·

Xenova

posted an update 3 months ago

Post

3275

Introducing TTS WebGPU: The first ever text-to-speech web app built with WebGPU acceleration! 🔥 High-quality and natural speech generation that runs 100% locally in your browser, powered by OuteTTS and Transformers.js. 🤗 Try it out yourself!

Demo: webml-community/text-to-speech-webgpu
Source code: https://github.com/huggingface/transformers.js-examples/tree/main/text-to-speech-webgpu
Model: onnx-community/OuteTTS-0.2-500M (ONNX), OuteAI/OuteTTS-0.2-500M (PyTorch)

Xenova

posted an update 3 months ago

Post

4115

We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🤯 Let's take a look:
🔀 Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
👁️ Qwen2-VL from Qwen for dynamic-resolution image understanding
🔢 JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
🌋 LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
🤸‍♀️ ViTPose for pose estimation
📄 MGP-STR for optical character recognition (OCR)
📈 PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! 🔥 Huge for privacy!

Check out the release notes for more information. 👇
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU

Xenova

posted an update 4 months ago

Post

5786

Have you tried out 🤗 Transformers.js v3? Here are the new features:
⚡ WebGPU support (up to 100x faster than WASM)
🔢 New quantization formats (dtypes)
🏛 120 supported architectures in total
📂 25 new example projects and templates
🤖 Over 1200 pre-converted models
🌐 Node.js (ESM + CJS), Deno, and Bun compatibility
🏡 A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3

3 replies

·

Xenova

posted an update 7 months ago

Post

14024

I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! 🤯 Since everything runs 100% locally, no messages are sent to a server — a huge win for privacy!
- 🤗 Demo: webml-community/phi-3.5-webgpu
- 🧑‍💻 Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu

11 replies

·

Xenova

posted an update 7 months ago

Post

15075

I'm excited to announce that Transformers.js V3 is finally available on NPM! 🔥 State-of-the-art Machine Learning for the web, now with WebGPU support! 🤯⚡️

Install it from NPM with:
𝚗𝚙𝚖 𝚒 @𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎/𝚝𝚛𝚊𝚗𝚜𝚏𝚘𝚛𝚖𝚎𝚛𝚜

or via CDN, for example: https://v2.scrimba.com/s0lmm0qh1q

Segment Anything demo: webml-community/segment-anything-webgpu

5 replies

·

Xenova

posted an update 8 months ago

Post

8069

Introducing Whisper Diarization: Multilingual speech recognition with word-level timestamps and speaker segmentation, running 100% locally in your browser thanks to 🤗 Transformers.js!

Tested on this iconic Letterman interview w/ Grace Hopper from 1983!
- Demo: Xenova/whisper-speaker-diarization
- Source code: Xenova/whisper-speaker-diarization

1 reply

·

Xenova

posted an update 8 months ago

Post

6865

Introducing Whisper Timestamped: Multilingual speech recognition with word-level timestamps, running 100% locally in your browser thanks to 🤗 Transformers.js! Check it out!
👉 Xenova/whisper-word-level-timestamps 👈

This unlocks a world of possibilities for in-browser video editing! 🤯 What will you build? 😍

Source code: https://github.com/xenova/transformers.js/tree/v3/examples/whisper-word-timestamps

1 reply

·

Xenova

posted an update 8 months ago

Post

6164

Chrome's new window.ai feature is going to change the web forever! 🤯 It allows you to run Gemini Nano, a powerful 3.25B parameter LLM, 100% locally in your browser!

We've also added experimental support to 🤗 Transformers.js!
- Demo: Xenova/experimental-built-in-ai-chat
- Blog post: https://huggingface.co/blog/Xenova/run-gemini-nano-in-your-browser

5 replies

·

Xenova

posted an update 9 months ago

Post

6074

Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! 🤗🤯

It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW!
- Demo: Xenova/florence2-webgpu
- Models: https://huggingface.co/models?library=transformers.js&other=florence2
- Source code: https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu

Xenova

posted an update 9 months ago

Post

10345

Introducing Whisper WebGPU: Blazingly-fast ML-powered speech recognition directly in your browser! 🚀 It supports multilingual transcription and translation across 100 languages! 🤯

The model runs locally, meaning no data leaves your device! 😍

Check it out! 👇
- Demo: Xenova/whisper-webgpu
- Source code: https://github.com/xenova/whisper-web/tree/experimental-webgpu

7 replies

·

Xenova

posted an update 10 months ago

Post

11551

Introducing Phi-3 WebGPU, a private and powerful AI chatbot that runs 100% locally in your browser, powered by 🤗 Transformers.js and onnxruntime-web!

🔒 On-device inference: no data sent to a server
⚡️ WebGPU-accelerated (> 20 t/s)
📥 Model downloaded once and cached

Try it out: Xenova/experimental-phi3-webgpu

5 replies

·

Xenova

posted an update 11 months ago

Post

13029

Introducing MusicGen Web: AI-powered music generation directly in your browser, built with 🤗 Transformers.js! 🎵

Everything runs 100% locally, meaning there are no calls to an API! 🤯 Since it's served as a static HF space, it costs $0 to host and run! 🔥

We also added the ability to share your generated music to the discussion tab, so give it a try! 👇
Xenova/musicgen-web

2 replies

·

Xenova

posted an update about 1 year ago

Post

Introducing the 🤗 Transformers.js WebGPU Embedding Benchmark! ⚡️
👉 Xenova/webgpu-embedding-benchmark 👈

On my device, I was able to achieve a 64.04x speedup over WASM! 🤯 How much does WebGPU speed up ML models running locally in your browser? Try it out and share your results! 🚀

3 replies

·

Xenova

posted an update about 1 year ago

Post

Real-time object detection w/ 🤗 Transformers.js, running YOLOv9 locally in your browser! 🤯

Try it out yourself: Xenova/video-object-detection
(Model used + example code: Xenova/gelan-c_all)

This demo shows why on-device ML is so important:
1. Privacy - local inference means no user data is sent to the cloud
2. No server latency - empowers developers to build real-time applications
3. Lower costs - no need to pay for bandwidth and processing of streamed video

I can't wait to see what you build with it! 🔥

3 replies

·

Xenova

posted an update about 1 year ago

Post

Introducing Remove Background Web: In-browser background removal, powered by @briaai 's new RMBG-v1.4 model and 🤗 Transformers.js!

Everything runs 100% locally, meaning none of your images are uploaded to a server! 🤯 At only ~45MB, the 8-bit quantized version of the model is perfect for in-browser usage (it even works on mobile).

Check it out! 👇
Demo: Xenova/remove-background-web
Model: briaai/RMBG-1.4

9 replies

·

rmbg

AI & ML interests

Recent Activity

rmbg's activity

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

AI & ML interests

Recent Activity

Team members 1

rmbg's activity