witiko
/

mathberta

@@ -95,14 +95,21 @@ output = model(**encoded_input)
 ## Training data
-The RoBERTa model was fine-tuned on two datasets:
 - [ArXMLiv 2020][5], a dataset consisting of 1,581,037 ArXiv documents.
 - [Math StackExchange][6], a dataset of  2,466,080 questions and answers.
 Together theses datasets weight 52GB of text and LaTeX.
  [5]: https://sigmathling.kwarc.info/resources/arxmliv-dataset-2020/
  [6]: https://www.cs.rit.edu/~dprl/ARQMath/arqmath-resources.html
  [9]: https://github.com/huggingface/transformers/issues/16936
  [10]: https://github.com/huggingface/transformers/pull/17119

 ## Training data
+Our model was fine-tuned on two datasets:
 - [ArXMLiv 2020][5], a dataset consisting of 1,581,037 ArXiv documents.
 - [Math StackExchange][6], a dataset of  2,466,080 questions and answers.
 Together theses datasets weight 52GB of text and LaTeX.
+## Intrinsic evaluation results
+Our model achieves the following intrinsic evaluation results:
+ ![Intrinsic evaluation results of MathBERTa][11]
  [5]: https://sigmathling.kwarc.info/resources/arxmliv-dataset-2020/
  [6]: https://www.cs.rit.edu/~dprl/ARQMath/arqmath-resources.html
  [9]: https://github.com/huggingface/transformers/issues/16936
  [10]: https://github.com/huggingface/transformers/pull/17119
+ [11]: https://huggingface.co/witiko/mathberta/resolve/main/learning-curves.png