scikit-learn
community
AI & ML interests
Building interactive demos to scikit-learn examples 🧡
Recent Activity
View all activity
sklearn-docs's activity
prithivMLmods
posted
an
update
about 2 hours ago
suayptalha
posted
an
update
2 days ago
Post
1375
🚀 FastLlama Series is Live!
🦾 Experience faster, lighter, and smarter language models! The new FastLlama makes Meta's LLaMA models work with smaller file sizes, lower system requirements, and higher performance. The model supports 8 languages, including English, German, and Spanish.
🤖 Built on the LLaMA 3.2-1B-Instruct model, fine-tuned with Hugging Face's SmolTalk and MetaMathQA-50k datasets, and powered by LoRA (Low-Rank Adaptation) for groundbreaking mathematical reasoning.
💻 Its compact size makes it versatile for a wide range of applications!
💬 Chat with the model:
🔗 Chat Link: suayptalha/Chat-with-FastLlama
🔗 Model Link: suayptalha/FastLlama-3.2-1B-Instruct
🦾 Experience faster, lighter, and smarter language models! The new FastLlama makes Meta's LLaMA models work with smaller file sizes, lower system requirements, and higher performance. The model supports 8 languages, including English, German, and Spanish.
🤖 Built on the LLaMA 3.2-1B-Instruct model, fine-tuned with Hugging Face's SmolTalk and MetaMathQA-50k datasets, and powered by LoRA (Low-Rank Adaptation) for groundbreaking mathematical reasoning.
💻 Its compact size makes it versatile for a wide range of applications!
💬 Chat with the model:
🔗 Chat Link: suayptalha/Chat-with-FastLlama
🔗 Model Link: suayptalha/FastLlama-3.2-1B-Instruct
prithivMLmods
posted
an
update
3 days ago
Post
1498
Qwen2VL Models: Vision and Language Processing 🍉
📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]
❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.
🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.
📄PDFs are rendered using the ReportLab software library toolkit.
🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct
🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing
📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f
.
.
.
@prithivMLmods 🤗
📍FT; [ Latex OCR, Math Parsing, Text Analogy OCRTest ]
❄️Demo : prithivMLmods/Qwen2-VL-2B . The demo includes the Qwen2VL 2B Base Model.
🎯The space handles documenting content from the input image along with standardized plain text. It includes adjustment tools with over 30 font styles, file formatting support for PDF and DOCX, textual alignments, font size adjustments, and line spacing modifications.
📄PDFs are rendered using the ReportLab software library toolkit.
🧵Models :
+ prithivMLmods/Qwen2-VL-OCR-2B-Instruct
+ prithivMLmods/Qwen2-VL-Ocrtest-2B-Instruct
+ prithivMLmods/Qwen2-VL-Math-Prase-2B-Instruct
🚀Sample Document :
+ https://drive.google.com/file/d/1Hfqqzq4Xc-3eTjbz-jcQY84V5E1YM71E/view?usp=sharing
📦Collection :
+ prithivMLmods/vision-language-models-67639f790e806e1f9799979f
.
.
.
@prithivMLmods 🤗
prithivMLmods
posted
an
update
4 days ago
Post
3088
🎄 Here Before - Xmas🎅✨
🧑🏻🎄Models
+ [ Xmas 2D Illustration ] : strangerzonehf/Flux-Xmas-Illustration-LoRA
+ [ Xmas 3D Art ] : strangerzonehf/Flux-Xmas-3D-LoRA
+ [ Xmas Chocolate ] : strangerzonehf/Flux-Xmas-Chocolate-LoRA
+ [ Xmas Isometric Kit ] : strangerzonehf/Flux-Xmas-Isometric-Kit-LoRA
+ [ Xmas Realpix ] : strangerzonehf/Flux-Xmas-Realpix-LoRA
+ [ Xmas Anime ] : strangerzonehf/Flux-Anime-Xmas-LoRA
❄️Collections
+ [ Xmas Art ] : strangerzonehf/christmas-pack-6758b199487adafaddb68f82
+ [ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-org-6737118adcf2cb40d66d0c7e
🥶Page
+ [ Stranger Zone ] : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
🧑🏻🎄Models
+ [ Xmas 2D Illustration ] : strangerzonehf/Flux-Xmas-Illustration-LoRA
+ [ Xmas 3D Art ] : strangerzonehf/Flux-Xmas-3D-LoRA
+ [ Xmas Chocolate ] : strangerzonehf/Flux-Xmas-Chocolate-LoRA
+ [ Xmas Isometric Kit ] : strangerzonehf/Flux-Xmas-Isometric-Kit-LoRA
+ [ Xmas Realpix ] : strangerzonehf/Flux-Xmas-Realpix-LoRA
+ [ Xmas Anime ] : strangerzonehf/Flux-Anime-Xmas-LoRA
❄️Collections
+ [ Xmas Art ] : strangerzonehf/christmas-pack-6758b199487adafaddb68f82
+ [ Stranger Zone Collection ] : prithivMLmods/stranger-zone-collections-org-6737118adcf2cb40d66d0c7e
🥶Page
+ [ Stranger Zone ] : https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
Post
1992
Aya by Cohere For AI can now see! 👀
C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! 🗣️
The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo
Dataset maya-multimodal/pretrain
Model maya-multimodal/maya 👏
kudos @nahidalam and team
C4AI community has built Maya 8B, a new open-source multilingual VLM built on SigLIP and Aya 8B 🌱 works on 8 languages! 🗣️
The authors extend Llava dataset using Aya's translation capabilities with 558k examples!
ry it here kkr5155/maya_demo
Dataset maya-multimodal/pretrain
Model maya-multimodal/maya 👏
kudos @nahidalam and team
Post
2589
Apollo is a new family of open-source video language models by Meta, where 3B model outperforms most 7B models and 7B outperforms most 30B models 🧶
✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench
The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️
Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful 🔥
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield
They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥
✨ the models come in 1.5B https://huggingface.co/Apollo-LMMs/Apollo-1_5B-t32, 3B https://huggingface.co/Apollo-LMMs/Apollo-3B-t32 and 7B https://huggingface.co/Apollo-LMMs/Apollo-7B-t32 with A2.0 license, based on Qwen1.5 & Qwen2
✨ the authors also release a benchmark dataset https://huggingface.co/spaces/Apollo-LMMs/ApolloBench
The paper has a lot of experiments (they trained 84 models!) about what makes the video LMs work ⏯️
Try the demo for best setup here https://huggingface.co/spaces/Apollo-LMMs/Apollo-3B
they evaluate sampling strategies, scaling laws for models and datasets, video representation and more!
> The authors find out that whatever design decision was applied to small models also scale properly when the model and dataset are scaled 📈 scaling dataset has diminishing returns for smaller models
> They evaluate frame sampling strategies, and find that FPS sampling is better than uniform sampling, and they find 8-32 tokens per frame optimal
> They also compare image encoders, they try a variation of models from shape optimized SigLIP to DINOv2
they find google/siglip-so400m-patch14-384 to be most powerful 🔥
> they also compare freezing different parts of models, training all stages with some frozen parts give the best yield
They eventually release three models, where Apollo-3B outperforms most 7B models and Apollo 7B outperforms 30B models 🔥
prithivMLmods
posted
an
update
8 days ago
Post
1621
A complete RAG pipeline includes a reranker, which ranks the documents to find the best document 📓
Same goes for multimodal RAG, multimodal rerankers which we can integrate to multimodal RAG pipelines!
Learn how to build a complete multimodal RAG pipeline with vidore/colqwen2-v1.0 as retriever, lightonai/MonoQwen2-VL-v0.1 as reranker, Qwen/Qwen2-VL-7B-Instruct as VLM in this notebook that runs on a GPU as small as L4 🔥 https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_reranker_and_vlms
Same goes for multimodal RAG, multimodal rerankers which we can integrate to multimodal RAG pipelines!
Learn how to build a complete multimodal RAG pipeline with vidore/colqwen2-v1.0 as retriever, lightonai/MonoQwen2-VL-v0.1 as reranker, Qwen/Qwen2-VL-7B-Instruct as VLM in this notebook that runs on a GPU as small as L4 🔥 https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_reranker_and_vlms
eienmojiki
posted
an
update
10 days ago
Post
1367
👀 Introducing 2048 Game API: A RESTful API for the Classic Puzzle Game 🧩
I'm excited to share my latest project, 2048 Game API, a RESTful API that allows you to create, manage, and play games of 2048, a popular puzzle game where players slide numbered tiles to combine them and reach the goal of getting a tile with the value of 2048.
⭐ Features
Create new games with customizable board sizes (3-8)
Make moves (up, down, left, right) and get the updated game state
Get the current game state, including the board, score, and game over status
Delete games
Generate images of the game board with customizable themes (light and dark)
🔗 API Endpoints
🧩 Example Use Cases
- Create a new game with a 4x4 board:
- Make a move up:
- Get the current game state:
💕 Try it out!
- Demo: eienmojiki/2048
- Source: https://github.com/kogakisaki/koga-2048
- You can try out the API by running the server locally or using a tool like Postman to send requests to the API. I hope you enjoy playing 2048 with this API!
Let me know if you have any questions or feedback!
🐧 Mouse1 is our friend🐧
I'm excited to share my latest project, 2048 Game API, a RESTful API that allows you to create, manage, and play games of 2048, a popular puzzle game where players slide numbered tiles to combine them and reach the goal of getting a tile with the value of 2048.
⭐ Features
Create new games with customizable board sizes (3-8)
Make moves (up, down, left, right) and get the updated game state
Get the current game state, including the board, score, and game over status
Delete games
Generate images of the game board with customizable themes (light and dark)
🔗 API Endpoints
POST /api/games
- Create a new gameGET /api/games/:gameId
- Get the current game statePOST /api/games/:gameId/move
- Make a move (up, down, left, right)DELETE /api/games/:gameId
- Delete a gameGET /api/games/:gameId/image
- Generate an image of the game board🧩 Example Use Cases
- Create a new game with a 4x4 board:
curl -X POST -H "Content-Type: application/json" -d '{"size": 4}' http://localhost:3000/api/games
- Make a move up:
curl -X POST -H "Content-Type: application/json" -d '{"direction": "up"}' http://localhost:3000/api/games/:gameId/move
- Get the current game state:
curl -X GET http://localhost:3000/api/games/:gameId
💕 Try it out!
- Demo: eienmojiki/2048
- Source: https://github.com/kogakisaki/koga-2048
- You can try out the API by running the server locally or using a tool like Postman to send requests to the API. I hope you enjoy playing 2048 with this API!
Let me know if you have any questions or feedback!
🐧 Mouse1 is our friend🐧
Post
1578
First Global and Dense Open Embedding Dataset of Earth! 🌍 🤗
Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. 🔶 and Φ-lab at the European Space Agency (ESA) 🛰️. Together with @mikonvergence and Jędrzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.
💡 Highlights:
📊 Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
🧠 Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
📦 Scale: 62 TB of raw satellite data processed into 170M+ embeddings.
This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.
🤗 Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP
📖 Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
💻 Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
Introducing the Major TOM embeddings dataset, created in collaboration with CloudFerro S.A. 🔶 and Φ-lab at the European Space Agency (ESA) 🛰️. Together with @mikonvergence and Jędrzej S. Bojanowski, we present the first open-access dataset of Copernicus embeddings, offering dense, global coverage across the full acquisition areas of Sentinel-1 and Sentinel-2 sensors.
💡 Highlights:
📊 Data: Over 8 million Sentinel-1 & Sentinel-2 images processed, distilling insights from 9.368 trillion pixels of raw data.
🧠 Models: Foundation models include SigLIP, DINOv2, and SSL4EO.
📦 Scale: 62 TB of raw satellite data processed into 170M+ embeddings.
This project delivers open and free vectorized expansions of Major-TOM/README datasets, setting a new standard for embedding releases and enabling lightweight, scalable ingestion of Earth Observation (EO) data for countless applications.
🤗 Explore the datasets:
Major-TOM/Core-S2L1C-SSL4EO
Major-TOM/Core-S1RTC-SSL4EO
Major-TOM/Core-S2RGB-DINOv2
Major-TOM/Core-S2RGB-SigLIP
📖 Check paper: Global and Dense Embeddings of Earth: Major TOM Floating in the Latent Space (2412.05600)
💻 Code notebook: https://github.com/ESA-PhiLab/Major-TOM/blob/main/05-Generate-Major-TOM-Embeddings.ipynb
Post
5463
This week in open-source AI was insane 🤠 A small recap🕺🏻
merve/dec-6-releases-67545caebe9fc4776faac0a3
Multimodal 🖼️
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants 👏
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs 💬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license 🔥
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! 🔥 nearly 8TB pretraining data in many languages!
Image/Video Generation 🖼️
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux
Audio 🔊
> Indic-Parler-TTS is a new text2speech model made by community
Multimodal 🖼️
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants 👏
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts
LLMs 💬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license 🔥
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! 🔥 nearly 8TB pretraining data in many languages!
Image/Video Generation 🖼️
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux
Audio 🔊
> Indic-Parler-TTS is a new text2speech model made by community
Post
1474
New InternVL drop with a state-of-the-art 78B vision language model with MIT license 🔥 https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c
The release comes with seven new vision LMs based on InternViT 300M/6B and Qwen2.5 (0.5B, 3B, 32B, 72B) and InternLM2 (8B, 7B, 20B) in different sizes
78B model is of InternViT 6B and Qwen2.5-72B Instruct, can accomplish variety of tasks 👏 Try here OpenGVLab/InternVL
The release comes with seven new vision LMs based on InternViT 300M/6B and Qwen2.5 (0.5B, 3B, 32B, 72B) and InternLM2 (8B, 7B, 20B) in different sizes
78B model is of InternViT 6B and Qwen2.5-72B Instruct, can accomplish variety of tasks 👏 Try here OpenGVLab/InternVL
prithivMLmods
posted
an
update
16 days ago
Post
3800
Near 3:2 { 1280*832 } Adapters 🔥
🧪The datasets were prepared for a 3:2 aspect ratio by processing images of any dimension (width × height) in alignment with the adapter's concept. This involved using techniques such as magic expand, magic fill, or outpainting to adjust the remaining parts of the image to achieve the 3:2 ratio & posts training. This approach enhanced the desired image quality to up to 2 MB for detailed prompts and reduced artifacts in images sized at 1280 × 832.
🎈This approach was used instead of cropping down the 2x or 3x zoomed positions in the actual image. It generative filling to adjust the image's aspect ratio proportionally within the dataset.
🔧I used Canva's Magic Expand, Firefly's Generative Fill, and Flux's Outpaint for aspect ratio adjustments.
⬇️Model DLC :
+ [ Microworld Nft ] : strangerzonehf/Flux-Microworld-NFT-LoRA
+ [ Creative Stocks ] : strangerzonehf/Flux-Creative-Stocks-LoRA
+ [ Icon-Kit ] : strangerzonehf/Flux-Icon-Kit-LoRA
+ [ Claymation ] : strangerzonehf/Flux-Claymation-XC-LoRA
+ [ Super Portrait ] : strangerzonehf/Flux-Super-Portrait-LoRA
+ [ Ghibli Art ] : strangerzonehf/Flux-Ghibli-Art-LoRA
+ [ Isometric Site ] : strangerzonehf/Flux-Isometric-Site-LoRA
🧨Page :
1] Stranger Zone: https://huggingface.co/strangerzonehf
💣Space :
1] Flux LoRA DLC: prithivMLmods/FLUX-LoRA-DLC
📦Collections :
1] strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
2] prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
3] strangerzonehf/animaker-engine-673714956dec98c400c30cf6
4] strangerzonehf/mixer-engine-673582c9c5939d8aa5bf9533
.
.
.
@prithivMLmods
🧪The datasets were prepared for a 3:2 aspect ratio by processing images of any dimension (width × height) in alignment with the adapter's concept. This involved using techniques such as magic expand, magic fill, or outpainting to adjust the remaining parts of the image to achieve the 3:2 ratio & posts training. This approach enhanced the desired image quality to up to 2 MB for detailed prompts and reduced artifacts in images sized at 1280 × 832.
🎈This approach was used instead of cropping down the 2x or 3x zoomed positions in the actual image. It generative filling to adjust the image's aspect ratio proportionally within the dataset.
🔧I used Canva's Magic Expand, Firefly's Generative Fill, and Flux's Outpaint for aspect ratio adjustments.
⬇️Model DLC :
+ [ Microworld Nft ] : strangerzonehf/Flux-Microworld-NFT-LoRA
+ [ Creative Stocks ] : strangerzonehf/Flux-Creative-Stocks-LoRA
+ [ Icon-Kit ] : strangerzonehf/Flux-Icon-Kit-LoRA
+ [ Claymation ] : strangerzonehf/Flux-Claymation-XC-LoRA
+ [ Super Portrait ] : strangerzonehf/Flux-Super-Portrait-LoRA
+ [ Ghibli Art ] : strangerzonehf/Flux-Ghibli-Art-LoRA
+ [ Isometric Site ] : strangerzonehf/Flux-Isometric-Site-LoRA
🧨Page :
1] Stranger Zone: https://huggingface.co/strangerzonehf
💣Space :
1] Flux LoRA DLC: prithivMLmods/FLUX-LoRA-DLC
📦Collections :
1] strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
2] prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
3] strangerzonehf/animaker-engine-673714956dec98c400c30cf6
4] strangerzonehf/mixer-engine-673582c9c5939d8aa5bf9533
.
.
.
@prithivMLmods
prithivMLmods
posted
an
update
19 days ago
Post
2614
Milestone for Flux.1 Dev 🔥
💢The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
🔗 https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev
💢This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges
💢 Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006
💢 Page :
+ https://huggingface.co/strangerzonehf
💢 Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
💢The Flux.1 Dev model has crossed 1️⃣0️⃣,0️⃣0️⃣0️⃣ creative public adapters! 🎈
🔗 https://huggingface.co/models?other=base_model:adapter:black-forest-labs/FLUX.1-dev
💢This includes:
- 266 Finetunes
- 19 Quants
- 4 Merges
💢 Here’s the 10,000th public adapter : 😜
+ strangerzonehf/Flux-3DXL-Partfile-0006
💢 Page :
+ https://huggingface.co/strangerzonehf
💢 Collection :
+ prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
Post
2610
small but mighty 🔥
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Post
2845
Last week we were blessed with open-source models! A recap 💝
merve/nov-29-releases-674ccc255a57baf97b1e2d31
🖼️ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model 💗
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents 🤖
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model📖
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT 📕
💬 LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet 🔥
> AliBaba has released Marco-o1, a new open-source reasoning model 💥
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)
⏯️ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️
Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
merve/nov-29-releases-674ccc255a57baf97b1e2d31
🖼️ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model 💗
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents 🤖
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning model📖
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT 📕
💬 LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet 🔥
> AliBaba has released Marco-o1, a new open-source reasoning model 💥
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)
⏯️ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️
Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
prithivMLmods
posted
an
update
24 days ago
Post
2715
Fine-Textured [Polygon] Character 3D Design Renders 🙉
Adapters capable of providing better lighting control (Bn+, Bn-) and richer textures compared to previous sets require more contextual prompts for optimal performance.
The ideal settings are achieved at inference steps around 30–35, with the best dimensions being 1280 x 832 [ 3:2 ]. However, it also performs well with the default settings of 1024 x 1024 [ 1:1 ].
💢Models DLC :
+ strangerzonehf/Flux-3DXL-Partfile-0001
+ strangerzonehf/Flux-3DXL-Partfile-0002
+ strangerzonehf/Flux-3DXL-Partfile-0003
+ strangerzonehf/Flux-3DXL-Partfile-0004
+ strangerzonehf/Flux-3DXL-Partfile-C0001
💢Collections :
1] strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
2] prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
💢Space :
1] prithivMLmods/FLUX-LoRA-DLC
💢Page :
1] Stranger Zone: https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
Adapters capable of providing better lighting control (Bn+, Bn-) and richer textures compared to previous sets require more contextual prompts for optimal performance.
The ideal settings are achieved at inference steps around 30–35, with the best dimensions being 1280 x 832 [ 3:2 ]. However, it also performs well with the default settings of 1024 x 1024 [ 1:1 ].
💢Models DLC :
+ strangerzonehf/Flux-3DXL-Partfile-0001
+ strangerzonehf/Flux-3DXL-Partfile-0002
+ strangerzonehf/Flux-3DXL-Partfile-0003
+ strangerzonehf/Flux-3DXL-Partfile-0004
+ strangerzonehf/Flux-3DXL-Partfile-C0001
💢Collections :
1] strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
2] prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
💢Space :
1] prithivMLmods/FLUX-LoRA-DLC
💢Page :
1] Stranger Zone: https://huggingface.co/strangerzonehf
.
.
.
@prithivMLmods 🤗
Post
2148
The authors of ColPali trained a retrieval model based on SmolVLM 🤠
vidore/colsmolvlm-alpha
TLDR;
- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks
- ColSmolVLM is more memory efficient than ColQwen2 💗
TLDR;
- ColSmolVLM performs better than ColPali and DSE-Qwen2 on all English tasks
- ColSmolVLM is more memory efficient than ColQwen2 💗
Post
3845
Small yet mighty! 💫
We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠
We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39
Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗
We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠
We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39
Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO 💝
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO 💗
prithivMLmods
posted
an
update
27 days ago
Post
3259
HF Posts Receipts 🏆🚀
[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT
🥠The one thing that needs to be remembered is the 'username'.
🥠And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! 🙌
🥠[ Dataset ] : maxiw/hf-posts
.
.
.
@prithivMLmods
[ HF POSTS RECEIPT ] : prithivMLmods/HF-POSTS-RECEIPT
🥠The one thing that needs to be remembered is the 'username'.
🥠And yeah, thank you, @maxiw , for creating the awesome dataset and sharing them here! 🙌
🥠[ Dataset ] : maxiw/hf-posts
.
.
.
@prithivMLmods