Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

Add idefics2-8b for HuggingChat

#53
by wangdafa - opened

HuggingChat doesn't have a multimodal model yet

HuggingFaceM4 org
edited May 17

We are planning to do that if we make scaled versions of the model. Right now, at the 8B scale, even the best models are a bit too immature and often hallucinate.

@HugoLaurencon I feel it would be nice to have a smaller model, for example base from phi-3. I'm trying it but maybe the function convert_idefics2_weights_to_hf doesn't work?

HuggingFaceM4 org

It must work for a llama/mistral architecture, but if there are changes with phi-3 you might need to adapt the script

Sign up or log in to comment