Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
Banerjee
port8080
Follow
clem's profile picture
justomat's profile picture
Akash20000's profile picture
12 followers
·
2 following
port8080
AI & ML interests
datasets
Recent Activity
reacted
to
jsulz
's
post
with 👍
16 days ago
Doing a lot of benchmarking and visualization work, which means I'm always searching for interesting repos in terms of file types, size, branches, and overall structure. To help, I built a Space https://huggingface.co/spaces/jsulz/repo-info that lets you search for any repo and get back: - Treemap of the repository, color coded by file/directory size - Repo branches and their size - Cumulative size of different file types (e.g., the total size of all the safetensors in the repo) And because I'm interested in how this will fit in our work to leverage content-defined chunking for versioning repos on the Hub - https://huggingface.co/blog/from-files-to-chunks - everything has the number of chunks (1 chunk = 64KB) as well as the total size in bytes. Some of the treemaps are pretty cool. Attached are https://huggingface.co/black-forest-labs/FLUX.1-dev and for fun https://huggingface.co/datasets/laion/laion-audio-preview (which has nearly 10k .tar files 🤯)
reacted
to
jsulz
's
post
with 🔥
16 days ago
Doing a lot of benchmarking and visualization work, which means I'm always searching for interesting repos in terms of file types, size, branches, and overall structure. To help, I built a Space https://huggingface.co/spaces/jsulz/repo-info that lets you search for any repo and get back: - Treemap of the repository, color coded by file/directory size - Repo branches and their size - Cumulative size of different file types (e.g., the total size of all the safetensors in the repo) And because I'm interested in how this will fit in our work to leverage content-defined chunking for versioning repos on the Hub - https://huggingface.co/blog/from-files-to-chunks - everything has the number of chunks (1 chunk = 64KB) as well as the total size in bytes. Some of the treemaps are pretty cool. Attached are https://huggingface.co/black-forest-labs/FLUX.1-dev and for fun https://huggingface.co/datasets/laion/laion-audio-preview (which has nearly 10k .tar files 🤯)
reacted
to
jsulz
's
post
with 🔥
about 1 month ago
When the XetHub crew joined Hugging Face this fall, @erinys and I started brainstorming how to share our work to replace Git LFS on the Hub. Uploading and downloading large models and datasets takes precious time. That’s where our chunk-based approach comes in. Instead of versioning files (like Git and Git LFS), we version variable-sized chunks of data. For the Hugging Face community, this means: ⏩ Only upload the chunks that changed. 🚀 Download just the updates, not the whole file. 🧠 We store your file as deduplicated chunks In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x, but this isn’t just a performance boost. It’s a rethinking of how we manage models and datasets on the Hub. We're planning on our new storage backend to the Hub in early 2025 - check out our blog to dive deeper, and let us know: how could this improve your workflows? https://huggingface.co/blog/from-files-to-chunks
View all activity
Articles
Rearchitecting Hugging Face Uploads and Downloads
27 days ago
•
37
Organizations
port8080
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
upvoted
an
article
3 months ago
view article
Article
Improving Parquet Dedupe on Hugging Face Hub
Oct 5
•
31
upvoted
an
article
5 months ago
view article
Article
XetHub is joining Hugging Face!
Aug 8
•
81