PlanTL-GOB-ES
/

roberta-large-bne

national library of spain

roberta-large-bne

Inference Endpoints

Model card Files Files and versions Community

mmarimon commited on Nov 15, 2022

Commit

369efc7

•

1 Parent(s): c1ee3e0

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -41,6 +41,7 @@ widget:
   - [Training procedure](#training-procedure)
 - [Evaluation](#evaluation)
 - [Additional information](#additional-information)
   - [Contact information](#contact-information)
   - [Copyright](#copyright)
   - [Licensing information](#licensing-information)
@@ -124,10 +125,13 @@ Some of the statistics of the corpus:
 ### Training procedure
 The configuration of the **RoBERTa-large-bne** model is as follows:
  - RoBERTa-l: 24-layer, 1024-hidden, 16-heads, 355M parameters.
 The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
 The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
 The RoBERTa-large-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 96 hours with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 ## Evaluation

   - [Training procedure](#training-procedure)
 - [Evaluation](#evaluation)
 - [Additional information](#additional-information)
+  - [Author](#author)
   - [Contact information](#contact-information)
   - [Copyright](#copyright)
   - [Licensing information](#licensing-information)
 ### Training procedure
 The configuration of the **RoBERTa-large-bne** model is as follows:
  - RoBERTa-l: 24-layer, 1024-hidden, 16-heads, 355M parameters.
 The pretraining objective used for this architecture is masked language modeling without next sentence prediction.
 The training corpus has been tokenized using a byte version of Byte-Pair Encoding (BPE) used in the original [RoBERTA](https://arxiv.org/abs/1907.11692) model with a vocabulary size of 50,262 tokens.
 The RoBERTa-large-bne pre-training consists of a masked language model training that follows the approach employed for the RoBERTa base. The training lasted a total of 96 hours with 32 computing nodes each one with 4 NVIDIA V100 GPUs of 16GB VRAM.
 ## Evaluation