tiange/Cap3D
Updated
ā¢
9.38k
ā¢
95
The current demos in AnyModal are for visual+text tasks. We plan to add a few demos for other modalities like audio as well in the future. Our goal is to make it easy for anyone to create multimodal LLMs using any input modality tokenizer + LLM combination (hence the name AnyModal)!
Looks great! I am currently working on simplifying the training/fine-tuning multimodal LLMs in Torch: https://github.com/ritabratamaiti/AnyModal