Young Ho Shin
commited on
Commit
β’
f369852
1
Parent(s):
36bccd1
Clean up app.py and article.md
Browse files- app.py +3 -4
- article.md +41 -15
app.py
CHANGED
@@ -34,17 +34,17 @@ def process_image(image):
|
|
34 |
# !ls examples | grep png
|
35 |
|
36 |
# +
|
37 |
-
title = "Convert
|
38 |
|
39 |
with open('article.md',mode='r') as file:
|
40 |
article = file.read()
|
41 |
|
42 |
description = """
|
43 |
-
This is a demo of machine learning model trained to
|
44 |
To use it, simply upload an image or use one of the example images below and click 'submit'.
|
45 |
Results will show up in a few seconds.
|
46 |
|
47 |
-
Try rendering the
|
48 |
(The model is not perfect yet, so you may need to edit the resulting LaTeX a bit to get it to render a good match.)
|
49 |
|
50 |
"""
|
@@ -61,7 +61,6 @@ examples = [
|
|
61 |
[ "examples/7afdeff0e6.png" ],
|
62 |
[ "examples/b8f1e64b1f.png" ],
|
63 |
]
|
64 |
-
#examples =[["examples/image_0.png"], ["image_1.png"], ["image_2.png"]]
|
65 |
# -
|
66 |
|
67 |
iface = gr.Interface(fn=process_image,
|
|
|
34 |
# !ls examples | grep png
|
35 |
|
36 |
# +
|
37 |
+
title = "Convert image to LaTeX source code"
|
38 |
|
39 |
with open('article.md',mode='r') as file:
|
40 |
article = file.read()
|
41 |
|
42 |
description = """
|
43 |
+
This is a demo of machine learning model trained to reconstruct the LaTeX source code of an equation from an image.
|
44 |
To use it, simply upload an image or use one of the example images below and click 'submit'.
|
45 |
Results will show up in a few seconds.
|
46 |
|
47 |
+
Try rendering the generated LaTeX [here](https://quicklatex.com/) to compare with the original.
|
48 |
(The model is not perfect yet, so you may need to edit the resulting LaTeX a bit to get it to render a good match.)
|
49 |
|
50 |
"""
|
|
|
61 |
[ "examples/7afdeff0e6.png" ],
|
62 |
[ "examples/b8f1e64b1f.png" ],
|
63 |
]
|
|
|
64 |
# -
|
65 |
|
66 |
iface = gr.Interface(fn=process_image,
|
article.md
CHANGED
@@ -14,8 +14,8 @@ and the corresponding LaTeX code:
|
|
14 |
```
|
15 |
|
16 |
|
17 |
-
This demo is a first step in solving
|
18 |
-
Eventually, you'll be able to take a quick screenshot
|
19 |
and a program built with this model will generate its corresponding LaTeX source code
|
20 |
so that you can just copy/paste straight into your personal notes.
|
21 |
No more endless googling obscure LaTeX syntax!
|
@@ -24,25 +24,51 @@ No more endless googling obscure LaTeX syntax!
|
|
24 |
|
25 |
Because this problem involves looking at an image and generating valid LaTeX code,
|
26 |
the model needs to understand both Computer Vision (CV) and Natural Language Processing (NLP).
|
27 |
-
There are some other projects that aim to solve the same problem with some very interesting
|
28 |
-
|
29 |
and a "decoder" that takes that information and translates it into what is hopefully both valid and accurate LaTeX code.
|
|
|
|
|
30 |
|
31 |
-
|
32 |
-
...
|
33 |
-
|
34 |
-
I chose to tackle this problem with transfer learning.
|
35 |
The biggest reason for this is computing constraints -
|
36 |
-
|
37 |
There are some other benefits to this approach,
|
38 |
-
e.g. the architecture is already proven to be robust
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
I chose TrOCR, an OCR machine learning model trained by Microsoft on SRIOE data to produce text from receipts.
|
41 |
|
42 |
<p style='text-align: center'>Made by Young Ho Shin</p>
|
43 |
<p style='text-align: center'>
|
44 |
-
<a href = "mailto: yhshin.data@gmail.com">Email</a> |
|
45 |
-
<a href='https://www.github.com/yhshin11'>Github</a> |
|
46 |
-
<a href='https://www.linkedin.com/in/young-ho-shin-3995051b9/'>Linkedin</a>
|
47 |
-
|
48 |
</p>
|
|
|
14 |
```
|
15 |
|
16 |
|
17 |
+
This demo is a first step in solving this problem.
|
18 |
+
Eventually, you'll be able to take a quick partial screenshot from a paper
|
19 |
and a program built with this model will generate its corresponding LaTeX source code
|
20 |
so that you can just copy/paste straight into your personal notes.
|
21 |
No more endless googling obscure LaTeX syntax!
|
|
|
24 |
|
25 |
Because this problem involves looking at an image and generating valid LaTeX code,
|
26 |
the model needs to understand both Computer Vision (CV) and Natural Language Processing (NLP).
|
27 |
+
There are some other projects that aim to solve the same problem with some very interesting models.
|
28 |
+
These generally involve some kind of "encoder" that looks at the image and extracts/encodes the information about the equation from the image,
|
29 |
and a "decoder" that takes that information and translates it into what is hopefully both valid and accurate LaTeX code.
|
30 |
+
The "encode" part can be done using classic CNN architectures commonly used for CV tasks, or newer vision transformer architectures.
|
31 |
+
The "decode" part can be done with LSTMs or transformer decoders, using attention mechanism to make sure the decoder understands long range dependencies, e.g. remembering to close a bracket that was opened a long sequence away.
|
32 |
|
33 |
+
I chose to tackle this problem with transfer learning, using an existing OCR model and fine-tuning it for this task.
|
|
|
|
|
|
|
34 |
The biggest reason for this is computing constraints -
|
35 |
+
GPU hours are expensive so I wanted training to be reasonably fast, on the order of a couple of hours.
|
36 |
There are some other benefits to this approach,
|
37 |
+
e.g. the architecture is already proven to be robust.
|
38 |
+
I chose [TrOCR](https://arxiv.org/abs/2109.10282), a model trained at Microsoft for text recognition tasks which uses transformer architecture for both the encoder and decoder.
|
39 |
+
|
40 |
+
For the data, I used the `im2latex-100k` dataset, which includes a total of roughly 100k formulas and images.
|
41 |
+
Some preprocessing steps were done by Harvard NLP for the [`im2markup` project](https://github.com/harvardnlp/im2markup).
|
42 |
+
To limit the scope of the project and simplify the task, I limited training data to only look at equations containing 100 LaTeX tokens or less.
|
43 |
+
This covers most single line equations, including fractions, subscripts, symbols, etc, but does not cover large multi line equations, some of which can have up to 500 LaTeX tokens.
|
44 |
+
GPU training was done using on Kaggle in roughly 3 hours.
|
45 |
+
You can find the full training code on my Kaggle profile [here](https://www.kaggle.com/code/younghoshin/finetuning-trocr/notebook).
|
46 |
+
|
47 |
+
## What's next?
|
48 |
+
|
49 |
+
There's multiple improvements that I'm hoping to make to this project.
|
50 |
+
|
51 |
+
### More robust prediction
|
52 |
+
|
53 |
+
If you've tried the examples above (randomly sampled from the test set), you've noticed that the model predictions aren't quite perfect and the model occasionally misses, duplicates or mistakes tokens.
|
54 |
+
More training on the existing data set could help with this.
|
55 |
+
|
56 |
+
### More data
|
57 |
+
|
58 |
+
There's a lot of LaTeX data available on the internet besides `im2latex-100k`, e.g. arXiv and Wikipedia.
|
59 |
+
It's just waiting to be scraped and used for this project.
|
60 |
+
This means a lot of hours of scraping, cleaning, and processing but having a more diverse set of input images could improve model accuracy significantly.
|
61 |
+
|
62 |
+
### Faster and smaller model
|
63 |
+
|
64 |
+
The model currently takes a few seconds to process a single image.
|
65 |
+
I would love to improve performance so that it can run in one second or less, maybe even on mobile devices.
|
66 |
+
This might be impossible with TrOCR which is a fairly large model, designed for use on GPUs.
|
67 |
|
|
|
68 |
|
69 |
<p style='text-align: center'>Made by Young Ho Shin</p>
|
70 |
<p style='text-align: center'>
|
71 |
+
<a href = "mailto: yhshin.data@gmail.com">Email</a> |
|
72 |
+
<a href='https://www.github.com/yhshin11'>Github</a> |
|
73 |
+
<a href='https://www.linkedin.com/in/young-ho-shin-3995051b9/'>Linkedin</a>
|
|
|
74 |
</p>
|