eduagarcia commited on
Commit
fd8f5e0
·
verified ·
1 Parent(s): 968f303

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -19
README.md CHANGED
@@ -116,30 +116,34 @@ We adopted the standard [RoBERTa hyperparameters](https://arxiv.org/abs/1907.116
116
 
117
  ## Evaluation
118
 
119
- <!-- This section describes the evaluation protocols and provides the results. -->
120
-
121
- ### Testing Data, Factors & Metrics
122
-
123
- #### Testing Data
124
-
125
  The model was evaluated on ["PortuLex" benchmark](eduagarcia/portuguese_benchmark), a four-task benchmark designed to evaluate the quality and performance of language models in the Portuguese legal domain.
126
 
127
- #### Metrics
128
-
129
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
130
-
131
- [More Information Needed]
132
-
133
- ### Results
134
-
135
- [More Information Needed]
136
-
137
- #### Summary
138
-
 
 
 
 
 
 
 
 
 
139
 
140
  ## Citation
141
 
142
-
143
  [More Information Needed]
144
 
145
  ## Acknowledgment
 
 
 
116
 
117
  ## Evaluation
118
 
 
 
 
 
 
 
119
  The model was evaluated on ["PortuLex" benchmark](eduagarcia/portuguese_benchmark), a four-task benchmark designed to evaluate the quality and performance of language models in the Portuguese legal domain.
120
 
121
+ Macro F1-Score (\%) for multiple models evaluated on PortuLex benchmark test splits:
122
+
123
+ | **Model** | **LeNER** | **UlyNER-PL** | **FGV-STF** | **RRIP** | **Average (%)** |
124
+ |----------------------------------------------------------------------------|-----------|-----------------|-------------|:---------:|-----------------|
125
+ | | | Coarse/Fine | Coarse | | |
126
+ | [BERTimbau-base](https://dl.acm.org/doi/abs/10.1007/978-3-030-61377-8_28) | 88.34 | 86.39/83.83 | 79.34 | 82.34 | 83.78 |
127
+ | [BERTimbau-large](https://dl.acm.org/doi/abs/10.1007/978-3-030-61377-8_28) | 88.64 | 87.77/84.74 | 79.71 | **83.79** | 84.60 |
128
+ | [Albertina-PT-BR-base](https://arxiv.org/abs/2305.06721) | 89.26 | 86.35/84.63 | 79.30 | 81.16 | 83.80 |
129
+ | [Albertina-PT-BR-xlarge](https://arxiv.org/abs/2305.06721) | 90.09 | 88.36/**86.62** | 79.94 | 82.79 | 85.08 |
130
+ | [BERTikal-base](https://arxiv.org/abs/2110.15709) | 83.68 | 79.21/75.70 | 77.73 | 81.11 | 79.99 |
131
+ | [JurisBERT-base](https://repositorio.ufms.br/handle/123456789/5119) | 81.74 | 81.67/77.97 | 76.04 | 80.85 | 79.61 |
132
+ | [BERTimbauLAW-base](https://repositorio.ufms.br/handle/123456789/5119) | 84.90 | 87.11/84.42 | 79.78 | 82.35 | 83.20 |
133
+ | [Legal-XLM-R-base](https://arxiv.org/abs/2306.02069) | 87.48 | 83.49/83.16 | 79.79 | 82.35 | 83.24 |
134
+ | [Legal-XLM-R-large](https://arxiv.org/abs/2306.02069) | 88.39 | 84.65/84.55 | 79.36 | 81.66 | 83.50 |
135
+ | [Legal-RoBERTa-PT-large](https://arxiv.org/abs/2306.02069) | 87.96 | 88.32/84.83 | 79.57 | 81.98 | 84.02 |
136
+ | RoBERTaTimbau-base | 89.68 | 87.53/85.74 | 78.82 | 82.03 | 84.29 |
137
+ | RoBERTaLegalPT-base | 90.59 | 85.45/84.40 | 79.92 | 82.84 | 84.57 |
138
+ | RoBERTaLexPT-base | **90.73** | **88.56**/86.03 | **80.40** | 83.22 | **85.41** |
139
+
140
+ In summary, RoBERTaLexPT consistently achieves top legal NLP effectiveness despite its base size.
141
+ With sufficient pre-training data, it can surpass overparameterized models. The results highlight the importance of domain-diverse training data over sheer model scale.
142
 
143
  ## Citation
144
 
 
145
  [More Information Needed]
146
 
147
  ## Acknowledgment
148
+
149
+ This work has been supported by the AI Center of Excellence (Centro de Excelência em Inteligência Artificial – CEIA) of the Institute of Informatics at the Federal University of Goiás (INF-UFG).