|
--- |
|
license: eupl-1.1 |
|
datasets: |
|
- EuropeanParliament/cellar_eurovoc |
|
language: |
|
- en |
|
metrics: |
|
- type: f1 |
|
value: 0.XX |
|
name: micro F1 |
|
args: |
|
threshold: 0.XX |
|
- type: NDCG@3 |
|
value: 0.X |
|
name: NDCG@5 |
|
- type: NDCG@5 |
|
value: 0.XX |
|
name: NDCG@5 |
|
- type: NDCG@10 |
|
value: 0.XX |
|
name: NDCG@10 |
|
tags: |
|
- eurovoc |
|
pipeline_tag: text-classification |
|
|
|
widget: |
|
- text: "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities." |
|
|
|
--- |
|
|
|
# Eurovoc Multilabel Classifer |
|
|
|
[EuroVoc](https://op.europa.eu/fr/web/eu-vocabularies) is a large multidisciplinary multilingual hierarchical thesaurus of more than 7000 classes covering the activities of EU institutions. |
|
Given the number of legal documents produced every day and the huge mass of pre-existing documents to be classified high quality automated or semi-automated classification methods are most welcome in this domain. |
|
|
|
This model based on BERT Deep Neural Network was trained on more than 3, 200,000 documents to achieve that task and is used in a production environment via the huggingface inference endpoint. |
|
This model support the 24 languages of the European Union. |
|
|
|
|
|
## Architecture |
|
|
|
![architecture](architecture.png) |
|
|
|
This classification model is build on top of [EUBERT](https://huggingface.co/EuropeanParliament/EUBERT) with 7331 Eurovoc labels |
|
|
|
## Usage |
|
|
|
```python |
|
from eurovoc import EurovocTagger |
|
model = EurovocTagger.from_pretrained("EuropeanParliament/eurovoc_eu") |
|
``` |
|
|
|
## Metrics |
|
|
|
|
|
### Eurlex57k Dataset |
|
|
|
| Metric | Value | Threshold Value | |
|
|------------|----------|-----------------| |
|
| Micro F1 | 0.XX | 0.XX | |
|
| NDCG@3 | 0.XX | - | |
|
| NDCG@5 | 0.XX | - | |
|
| NDCG@10 | 0.XX | - | |
|
|
|
These values are in line with the state of the art in the field, see the publication [Large Scale Legal Text Classification Using Transformer Models](https://arxiv.org/pdf/2010.12871.pdf). |
|
|
|
|
|
## Inference Endpoint |
|
|
|
### Payload example |
|
|
|
```json |
|
{ |
|
"inputs": "The Union condemns the continuing grave human rights violations by the Myanmar armed forces, including torture, sexual and gender-based violence, the persecution of civil society actors, human rights defenders and journalists, and attacks on the civilian population, including ethnic and religious minorities. ", |
|
"topk": 10, |
|
"threshold": 0.16 |
|
} |
|
|
|
``` |
|
|
|
result: |
|
|
|
```json |
|
{'results': [{'label': 'international sanctions', 'score': 0.9994925260543823}, |
|
{'label': 'economic sanctions', 'score': 0.9991770386695862}, |
|
{'label': 'natural person', 'score': 0.9591936469078064}, |
|
{'label': 'EU restrictive measure', 'score': 0.8388392329216003}, |
|
{'label': 'legal person', 'score': 0.45630475878715515}, |
|
{'label': 'Burma/Myanmar', 'score': 0.43375277519226074}]} |
|
``` |
|
|
|
Only six results, because the following one score is less that 0.16 |
|
|
|
Default value, topk = 5 and threshold = 0.16 |
|
|
|
|
|
## Author(s) |
|
|
|
Sébastien Campion <sebastien.campion@europarl.europa.eu> |