Add FT tutorial link
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ SmolVLM is a compact open multimodal model that accepts arbitrary sequences of i
|
|
35 |
|
36 |
SmolVLM can be used for inference on multimodal (image + text) tasks where the input comprises text queries along with one or more images. Text and images can be interleaved arbitrarily, enabling tasks like image captioning, visual question answering, and storytelling based on visual content. The model does not support image generation.
|
37 |
|
38 |
-
To fine-tune SmolVLM on a specific task, you can follow the fine-tuning tutorial.
|
39 |
<!-- todo: add link to fine-tuning tutorial -->
|
40 |
|
41 |
### Technical Summary
|
|
|
35 |
|
36 |
SmolVLM can be used for inference on multimodal (image + text) tasks where the input comprises text queries along with one or more images. Text and images can be interleaved arbitrarily, enabling tasks like image captioning, visual question answering, and storytelling based on visual content. The model does not support image generation.
|
37 |
|
38 |
+
To fine-tune SmolVLM on a specific task, you can follow the [fine-tuning tutorial](https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb).
|
39 |
<!-- todo: add link to fine-tuning tutorial -->
|
40 |
|
41 |
### Technical Summary
|