Score image-text similarity using CLIP or SigLIP models
Display chatbot leaderboard and stats
Annotate and describe images with text prompts
Cobra: Extending Mamba to MLLM for Efficient Inference