Update README.md
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ SmolVLM2-256M-Video is a lightweight multimodal model designed to analyze video
|
|
43 |
|
44 |
SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation.
|
45 |
|
46 |
-
To fine-tune SmolVLM2 on a specific task, you can follow [the fine-tuning tutorial](
|
47 |
|
48 |
## Evaluation
|
49 |
|
|
|
43 |
|
44 |
SmolVLM2 can be used for inference on multimodal (video / image / text) tasks where the input consists of text queries along with video or one or more images. Text and media files can be interleaved arbitrarily, enabling tasks like captioning, visual question answering, and storytelling based on visual content. The model does not support image or video generation.
|
45 |
|
46 |
+
To fine-tune SmolVLM2 on a specific task, you can follow [the fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
|
47 |
|
48 |
## Evaluation
|
49 |
|