Young Ho Shin
commited on
Commit
β’
505f1ad
1
Parent(s):
03d8885
Minor editing
Browse files- article.md +1 -1
article.md
CHANGED
@@ -41,7 +41,7 @@ For the data, I used the `im2latex-100k` dataset, which includes a total of roug
|
|
41 |
Some preprocessing steps were done by Harvard NLP for the [`im2markup` project](https://github.com/harvardnlp/im2markup).
|
42 |
To limit the scope of the project and simplify the task, I limited training data to only look at equations containing 100 LaTeX tokens or less.
|
43 |
This covers most single line equations, including fractions, subscripts, symbols, etc, but does not cover large multi line equations, some of which can have up to 500 LaTeX tokens.
|
44 |
-
GPU training was done
|
45 |
You can find the full training code on my Kaggle profile [here](https://www.kaggle.com/code/younghoshin/finetuning-trocr/notebook).
|
46 |
|
47 |
## What's next?
|
|
|
41 |
Some preprocessing steps were done by Harvard NLP for the [`im2markup` project](https://github.com/harvardnlp/im2markup).
|
42 |
To limit the scope of the project and simplify the task, I limited training data to only look at equations containing 100 LaTeX tokens or less.
|
43 |
This covers most single line equations, including fractions, subscripts, symbols, etc, but does not cover large multi line equations, some of which can have up to 500 LaTeX tokens.
|
44 |
+
GPU training was done on a Kaggle GPU Kernel in roughly 3 hours.
|
45 |
You can find the full training code on my Kaggle profile [here](https://www.kaggle.com/code/younghoshin/finetuning-trocr/notebook).
|
46 |
|
47 |
## What's next?
|