Julien Chaumond PRO

julien-c

AI & ML interests

<3 ML/AI for everyone, building products to propel communities fwd

Recent Activity

liked a model about 6 hours ago
meditsolutions/Llama-3.2-SUN-1B-Instruct
liked a dataset about 10 hours ago
HuggingFaceTB/smoltalk
liked a model about 11 hours ago
HuggingFaceTB/SmolVLM-Instruct
View all activity

Articles

Organizations

julien-c's activity

liked a Space about 12 hours ago
Reacted to jsulz's post with ❤️🔥 6 days ago
view post
Post
2848
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in.

Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means:

⏩ Only upload the chunks that changed.
🚀 Download just the updates, not the whole file.
🧠 We store your file as deduplicated chunks

In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub.

We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows?

https://huggingface.co/blog/from-files-to-chunks
Reacted to jsulz's post with 🚀 7 days ago
view post
Post
1981
In August, the XetHub team joined Hugging Face
- https://huggingface.co/blog/xethub-joins-hf - and we’ve been rolling up our sleeves to bring the best of both worlds together. We started with a deep dive into the current state of files stored with Git LFS on the Hub.

Getting this information was no small feat. We had to:
* Analyze a complete database dump of all repositories and files stored in Git LFS across Hugging Face.
* Parse through metadata on file sizes and types to accurately map the storage breakdown across Spaces, Models, and Datasets.

You can read more about the findings (with some jaw-dropping stats + charts) here https://www.linkedin.com/feed/update/urn:li:activity:7244486280351285248