Commit
·
e036794
1
Parent(s):
19d53c2
Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,8 @@ The dataset consists of **856 {image: caption}** pairs, providing a substantial
|
|
30 |
The model is conditioned on both CLIP image tokens and text tokens and employs a **teacher forcing** training approach. It predicts the next text token while considering the context provided by the image and previous text tokens.
|
31 |
|
32 |
|
|
|
|
|
33 |
## Limitations
|
34 |
+ The quality of generated captions may vary depending on the complexity and diversity of images from the 'One-Piece-anime-captions' dataset.
|
35 |
+ The model's output is based on the data it was fine-tuned on, so it may not generalize well to images outside the dataset's domain.
|
|
|
30 |
The model is conditioned on both CLIP image tokens and text tokens and employs a **teacher forcing** training approach. It predicts the next text token while considering the context provided by the image and previous text tokens.
|
31 |
|
32 |
|
33 |
+

|
34 |
+
|
35 |
## Limitations
|
36 |
+ The quality of generated captions may vary depending on the complexity and diversity of images from the 'One-Piece-anime-captions' dataset.
|
37 |
+ The model's output is based on the data it was fine-tuned on, so it may not generalize well to images outside the dataset's domain.
|