view post Post 1755 Reply New open Vision Language Model by @Google : PaliGemma ππ€π Comes in 3B, pretrained, mix and fine-tuned models in 224, 448 and 896 resolution𧩠Combination of Gemma 2B LLM and SigLIP image encoderπ€ Supported in transformersPaliGemma can do..𧩠Image segmentation and detection! π€―π Detailed document understanding and reasoningπ Visual question answering, captioning and any other VLM task!Read our blog π hf.co/blog/paligemmaTry the demo πͺ hf.co/spaces/google/paligemmaCheck out the Spaces and the models all in the collection π google/paligemma-release-6643a9ffbf57de2ae0448ddaCollection of fine-tuned PaliGemma models google/paligemma-ft-models-6643b03efb769dad650d2dda 13 replies Β· π₯ 13 13 π 8 8 β€οΈ 6 6 π 4 4 +
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x β’ Jun 23 β’ 34
[lecture artifacts] aligning open language models Collection artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin β’ 63 items β’ Updated Apr 17 β’ 56
view article Article Fine-tuning a large language model on Kaggle Notebooks (or even on your own computer) for solving real-world tasks By lmassaron β’ Feb 21 β’ 13