Julien BLANCHON's picture

Julien BLANCHON PRO

blanchon

·

AI & ML interests

Math

Recent Activity

reacted to Xenova's post with 🔥 about 5 hours ago

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! 🚀 Faster and more accurate than Whisper 🔒 Privacy-focused (no data leaves your device) ⚡️ WebGPU accelerated (w/ WASM fallback) 🔥 Powered by ONNX Runtime Web and Transformers.js Demo: https://huggingface.co/spaces/webml-community/moonshine-web Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web

reacted to Xenova's post with ❤️ about 5 hours ago

Introducing Moonshine Web: real-time speech recognition running 100% locally in your browser! 🚀 Faster and more accurate than Whisper 🔒 Privacy-focused (no data leaves your device) ⚡️ WebGPU accelerated (w/ WASM fallback) 🔥 Powered by ONNX Runtime Web and Transformers.js Demo: https://huggingface.co/spaces/webml-community/moonshine-web Source code: https://github.com/huggingface/transformers.js-examples/tree/main/moonshine-web

reacted to toshas's post with 😎 about 5 hours ago

Introducing ⇆ Marigold-DC — our training-free zero-shot approach to monocular Depth Completion with guided diffusion! If you have ever wondered how else a long denoising diffusion schedule can be useful, we have an answer for you! Depth Completion addresses sparse, incomplete, or noisy measurements from photogrammetry or sensors like LiDAR. Sparse points aren’t just hard for humans to interpret — they also hinder downstream tasks. Traditionally, depth completion was framed as image-guided depth interpolation. We leverage Marigold, a diffusion-based monodepth model, to reframe it as sparse-depth-guided depth generation. How the turntables! Check out the paper anyway 👇 🌎 Website: https://marigolddepthcompletion.github.io/ 🤗 Demo: https://huggingface.co/spaces/prs-eth/marigold-dc 📕 Paper: https://arxiv.org/abs/2412.13389 👾 Code: https://github.com/prs-eth/marigold-dc Team ETH Zürich: Massimiliano Viola (@mviola), Kevin Qu (@KevinQu7), Nando Metzger (@nandometzger), Bingxin Ke (@Bingxin), Alexander Becker, Konrad Schindler, and Anton Obukhov (@toshas). We thank Hugging Face for their continuous support.

View all activity

Organizations

blanchon's activity

commented a paper 2 months ago

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Paper • 2410.12705 • Published Oct 16 • 29 •

New activity in blanchon/PixDiet 2 months ago

Potential benchmark

#1 opened 2 months ago by

New activity in cbensimon/zerogpu-quickstart 2 months ago

Create app.py

#7 opened 2 months ago by

New activity in enzostvs/lora-studio 6 months ago

add-animate-flip-everywhere

#14 opened 6 months ago by

New activity in codys12/MergeLlama 6 months ago

fix dataset

#1 opened about 1 year ago by

commented 15 papers 7 months ago

Guiding a Diffusion Model with a Bad Version of Itself

Paper • 2406.02507 • Published Jun 4 • 15 •

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Paper • 2406.02523 • Published Jun 4 • 10 •

V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

Paper • 2406.02511 • Published Jun 4 • 9 •

I4VGen: Image as Stepping Stone for Text-to-Video Generation

Paper • 2406.02230 • Published Jun 4 • 16 •

Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3 • 18 •

To Believe or Not to Believe Your LLM

Paper • 2406.02543 • Published Jun 4 • 32 •

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4 • 30 •

PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

Paper • 2406.02886 • Published Jun 5 • 8 •

Item-Language Model for Conversational Recommendation

Paper • 2406.02844 • Published Jun 5 • 8 •

Searching Priors Makes Text-to-Video Synthesis Better

Paper • 2406.03215 • Published Jun 5 • 11 •

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Paper • 2406.02884 • Published Jun 5 • 15 •

Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5 • 18 •

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration

Paper • 2406.01014 • Published Jun 3 • 31 •

Parrot: Multilingual Visual Instruction Tuning

Paper • 2406.02539 • Published Jun 4 • 35 •

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 37 •