updated tag
Browse files
README.md
CHANGED
@@ -7,6 +7,7 @@ tags:
|
|
7 |
- Image-to-Text
|
8 |
- OCR
|
9 |
- Image-Captioning
|
|
|
10 |
datasets:
|
11 |
- priyank-m/text_recognition_en_zh_clean
|
12 |
metrics:
|
@@ -36,12 +37,4 @@ Notes and observations:
|
|
36 |
12. Streaming dataset might be another good option if the dataset size were to increase any further.
|
37 |
13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
|
38 |
14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
|
39 |
-
15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
|
|
7 |
- Image-to-Text
|
8 |
- OCR
|
9 |
- Image-Captioning
|
10 |
+
- Text-Recognition
|
11 |
datasets:
|
12 |
- priyank-m/text_recognition_en_zh_clean
|
13 |
metrics:
|
|
|
37 |
12. Streaming dataset might be another good option if the dataset size were to increase any further.
|
38 |
13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
|
39 |
14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
|
40 |
+
15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|