12 16 13

Xiangtai Li

LXT

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

reacted to merve's post with 👍 1 day ago

ByteDance just dropped SA2VA: a new family of vision LMs combining Qwen2VL/InternVL and SAM2 with MIT license 💗 https://huggingface.co/collections/ByteDance/sa2va-model-zoo-677e3084d71b5f108d00e093 > The models are capable of tasks involving vision-language understanding and visual referrals (referring segmentation) both for images and videos ⏯️ > The models come in 1B, 4B and 8B and are based on InternVL2.5 for base architecture and Qwen2, Qwen2.5 and InternLM2 for language model part (depending on the checkpoint) > The model is very interesting, it has different encoders for different modalities each (visual prompt, text prompt, image and video) then it concatenates these to feed into LLM 💬 the output segmentation tokens are passed to SAM2, to sort of match text (captions or semantic classes) to masks ⤵️ > Their annotation pipeline is also interesting, they seems to use two open large vision LMs to refine the annotations, and have different levels of descriptions to provide consistency.

reacted to merve's post with 🔥 1 day ago

authored a paper 2 days ago

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

View all activity

Organizations

LXT's activity

liked a model 3 days ago

ByteDance/Sa2VA-4B

Image-Text-to-Text • Updated about 12 hours ago • 399 • 29

liked a dataset 22 days ago

zhangtao-whu/OMG-LLaVA

Updated Jul 3, 2024 • 829 • 3

liked a dataset about 1 month ago

jianzongwu/MangaZero

Viewer • Updated about 1 month ago • 32.7k • 198 • 21

liked a model 2 months ago

Collov-Labs/Monetico

Text-to-Image • Updated Oct 28, 2024 • 26 • 65

liked a Space 3 months ago

Running on Zero

🚀

Meissonic Flow

liked a model 3 months ago

MeissonFlow/Meissonic

Text-to-Image • Updated Dec 5, 2024 • 44 • 98

liked 2 models 6 months ago

zhangtao-whu/OMG-LLaVA

Updated Jul 3, 2024 • 5

PhoenixZ/MG-LLaVA

Updated Jun 26, 2024 • 7

liked a Space 6 months ago

Runtime error

🏆

FaceAdapter

liked a model 7 months ago

openlm-research/open_llama_3b

Text Generation • Updated Jun 16, 2023 • 127k • 154

liked a dataset 9 months ago

ILSVRC/imagenet-1k

Updated Jul 16, 2024 • 16.6k • 439

liked a Space 12 months ago

Sleeping

🌍

OMG-SEG

liked a model 12 months ago

LXT/OMG_Seg

Updated Jan 19, 2024 • 7