Juan CM

jucamohedano
Β·

AI & ML interests

Deep Learning and Robotics πŸš€πŸ€–

Recent Activity

Organizations

jucamohedano's activity

upvoted an article 6 months ago
view article
Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

β€’ 211
Reacted to merve's post with πŸš€ 6 months ago
view post
Post
1755
New open Vision Language Model by @Google : PaliGemma πŸ’™πŸ€

πŸ“ Comes in 3B, pretrained, mix and fine-tuned models in 224, 448 and 896 resolution
🧩 Combination of Gemma 2B LLM and SigLIP image encoder
πŸ€— Supported in transformers

PaliGemma can do..
🧩 Image segmentation and detection! 🀯
πŸ“‘ Detailed document understanding and reasoning
πŸ™‹ Visual question answering, captioning and any other VLM task!

Read our blog πŸ”– hf.co/blog/paligemma
Try the demo πŸͺ€ hf.co/spaces/google/paligemma
Check out the Spaces and the models all in the collection πŸ“š google/paligemma-release-6643a9ffbf57de2ae0448dda
Collection of fine-tuned PaliGemma models google/paligemma-ft-models-6643b03efb769dad650d2dda
Β·
upvoted an article 7 months ago
view article
Article

SeeMoE: Implementing a MoE Vision Language Model from Scratch

By AviSoori1x β€’
β€’ 34
upvoted 2 articles 7 months ago
view article
Article

Fine-tuning a large language model on Kaggle Notebooks (or even on your own computer) for solving real-world tasks

By lmassaron β€’
β€’ 13
view article
Article

Design choices for Vision Language Models in 2024

By gigant β€’
β€’ 25
upvoted an article 8 months ago
view article
Article

Vision Language Models Explained

β€’ 214
upvoted an article 8 months ago
view article
Article

Mixture of Experts Explained

β€’ 199