Multimodal AI - a shi-labs Collection

shi-labs 's Collections

Visual Understanding

Multimodal AI

updated Dec 11, 2024

Large multimodal models

Running on Zero

5

5

OLA-VLM

🔍

Generate images and insights from text and images
Running on Zero

82

82

CuMo 7b Zero

🐐

Generate text based on images and text input
Runtime error

63

63

VCoder

✌
shi-labs/vcoder_ds_llava-v1.5-13b

Text Generation • Updated Dec 20, 2023 • 21 • 4
shi-labs/CuMo-mistral-7b

Text Generation • Updated May 9, 2024 • 53 • 15
shi-labs/CuMo-mixtral-8x7b

Text Generation • Updated May 9, 2024 • 16 • 3
shi-labs/vcoder_llava-v1.5-7b

Text Generation • Updated Dec 20, 2023 • 21 • 2
shi-labs/vcoder_ds_llava-v1.5-7b

Text Generation • Updated Dec 20, 2023 • 19
shi-labs/vcoder_llava-v1.5-13b

Text Generation • Updated Dec 20, 2023 • 25 • 4
VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Paper • 2312.14233 • Published Dec 21, 2023 • 16
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Paper • 2405.05949 • Published May 9, 2024 • 2
shi-labs/OLA-VLM-CLIP-ViT-Phi3-4k-mini

Image-Text-to-Text • Updated Dec 10, 2024 • 9 • 1
shi-labs/OLA-VLM-CLIP-ConvNeXT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 7 • 1
shi-labs/OLA-VLM-CLIP-ConvNeXT-Phi3-4k-mini

Image-Text-to-Text • Updated Dec 10, 2024 • 11 • 1
shi-labs/vpt_OLA-VLM-CLIP-ConvNeXT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 8 • 2
shi-labs/OLA-VLM-CLIP-ViT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 9
shi-labs/pretrain_dsg_OLA-VLM-CLIP-ViT-Llama3-8b

Image-Text-to-Text • Updated Dec 10, 2024 • 19