Vít Novotný commited on
Commit
f353573
1 Parent(s): c5f9bdf

Update link to repository

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -14,13 +14,13 @@ CLEF 2022 and first released in [this repository][2]. This model is case-sensiti
14
  it makes a difference between english and English.
15
 
16
  [1]: https://www.cs.rit.edu/~dprl/ARQMath/
17
- [2]: https://github.com/witiko/scm-at-arqmath3/blob/main/03-finetune-roberta.ipynb
18
 
19
  ## Model description
20
 
21
- MathBERTa is [the RoBERTa base transformer model][3] whose tokenizer has been
22
- extended with LaTeX math symbols and which has been fine-tuned on a large
23
- corpus of English mathematical texts.
24
 
25
  Like RoBERTa, MathBERTa has been fine-tuned with the Masked language modeling
26
  (MLM) objective. Taking a sentence, the model randomly masks 15% of the words
@@ -30,6 +30,8 @@ learns an inner representation of the English language and the language of
30
  LaTeX that can then be used to extract features useful for downstream tasks.
31
 
32
  [3]: https://huggingface.co/roberta-base
 
 
33
 
34
  ## Intended uses & limitations
35
 
 
14
  it makes a difference between english and English.
15
 
16
  [1]: https://www.cs.rit.edu/~dprl/ARQMath/
17
+ [2]: https://github.com/witiko/scm-at-arqmath3
18
 
19
  ## Model description
20
 
21
+ MathBERTa is [the RoBERTa base transformer model][3] whose [tokenizer has been
22
+ extended with LaTeX math symbols][7] and which has been [fine-tuned on a large
23
+ corpus of English mathematical texts][8].
24
 
25
  Like RoBERTa, MathBERTa has been fine-tuned with the Masked language modeling
26
  (MLM) objective. Taking a sentence, the model randomly masks 15% of the words
 
30
  LaTeX that can then be used to extract features useful for downstream tasks.
31
 
32
  [3]: https://huggingface.co/roberta-base
33
+ [7]: https://github.com/Witiko/scm-at-arqmath3/blob/main/02-train-tokenizers.ipynb
34
+ [8]: https://github.com/witiko/scm-at-arqmath3/blob/main/03-finetune-roberta.ipynb
35
 
36
  ## Intended uses & limitations
37