priyank-m
/

m_OCR

Image-Text-to-Text

vision-encoder-decoder

Image-Captioning

Text-Recognition

Inference Endpoints

Model card Files Files and versions Community

priyank-m commited on Jan 9, 2023

Commit

01f884d

•

1 Parent(s): 6d7d747

updated tag

Files changed (1) hide show

README.md +2 -9

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ tags:
 - Image-to-Text
 - OCR
 - Image-Captioning
 datasets:
 - priyank-m/text_recognition_en_zh_clean
 metrics:
@@ -36,12 +37,4 @@ Notes and observations:
 12. Streaming dataset might be another good option if the dataset size were to increase any further.
 13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
 14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
-15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.

 - Image-to-Text
 - OCR
 - Image-Captioning
+- Text-Recognition
 datasets:
 - priyank-m/text_recognition_en_zh_clean
 metrics:
 12. Streaming dataset might be another good option if the dataset size were to increase any further.
 13. Free GPU on colab seem not enough for this experiment, as keeping two models in GPU and training forces to keep batch size small and also the free GPUs (T4) are not fast enough.
 14. A very important data cleaning step was to just check if the sample image and text can be converted to the input format expected by the model, the text should be non-empty value when converted back from the input IDs to text (some characters are not identified by the tokenizer and get converted to special token and we usually skip the special tokens when converting input IDs back to text) as it is required to be non-empty while doing the CER calculation.
+15. Resuming model training was taking almost 1 or sometimes 2 hours in just skipping the batches, to avoid this wastage one possible solution would be to shuffle the training dataset before starting the training and then avoid the skipping of batches. This would be particularly useful when we increse the dataset size further.