|
--- |
|
datasets: |
|
- gguichard/coref_dataset |
|
language: |
|
- fr |
|
library_name: transformers |
|
widget: |
|
- text: "Un homme, qui parle à son collègue, <s'> avance vers moi." |
|
--- |
|
|
|
# Easter-Island/coref_classifier_ancor |
|
|
|
## Table of Contents |
|
- [Model Details](#model-details) |
|
- [Uses](#uses) |
|
- [Risks, Limitations and Biases](#risks-limitations-and-biases) |
|
- [Training](#training) |
|
- [Evaluation](#evaluation) |
|
- [Citation Information](#citation-information) |
|
- [How to Get Started With the Model](#how-to-get-started-with-the-model) |
|
|
|
- ## Model Details |
|
- **Model Description:** |
|
This model is a state-of-the-art language model for French coreference resolution. |
|
- **Developed by:** Grégory Guichard |
|
- **Model Type:** Token Classification |
|
- **Language(s):** French |
|
- **License:** MIT |
|
- **Parent Model:** See the [Camembert-large model](https://huggingface.co/camembert/camembert-large) for more information about the RoBERTa base model. |
|
- **Resources for more information:** |
|
|
|
|
|
## Uses |
|
|
|
This model can be used for Coreference token classification tasks. |
|
The model evaluates, for each token, if it is a reference of the expression between "<>". |
|
|
|
### Example |
|
|
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Un homme me parle. <Il> est beau." |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Un', '▁homme'] |
|
``` |
|
|
|
This coreference resolver can perform many tasks |
|
|
|
### Reprise pronominale |
|
```python |
|
from transformers import AutoModelForTokenClassification, AutoTokenizer |
|
|
|
model = AutoModelForTokenClassification.from_pretrained("models/merged/ancor_classifier") |
|
tokenizer = AutoTokenizer.from_pretrained("models/merged/ancor_classifier_tokenizer") |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Platon est un philosophe antique de la Grèce classique... Il reprit le travail philosophique decertains de <ses> prédécesseurs" |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Platon'] |
|
|
|
text = "Platon est un philosophe antique de la Grèce classique... <Il> reprit le travail philosophique decertains de ses prédécesseurs" |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Platon', '▁un', '▁philosophe', '▁antique', '▁de','▁la', '▁Grèce', '▁classique', '▁ses'] |
|
|
|
``` |
|
|
|
### Anaphores fidèles |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Le chat que j’ai adopté court partout... Mais j’aime beaucoup <ce chat> ." |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Le', '▁chat'] |
|
``` |
|
|
|
### Anaphores infidèles |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Le chat que j’ai adopté court partout... Mais j’aime beaucoup <cet animal> ." |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Le', '▁chat'] |
|
``` |
|
|
|
### Paroles rapportées |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = """Lionel Jospin se livre en revanche à une longue analyse de son échec du 21 avril. “Ma part de responsabilité dans l’échec existe forcément. <Je> l’ai assumée en quittant la vie politique”""" |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Lionel', '▁Jos', 'pin', '▁son', 'Ma'] |
|
``` |
|
|
|
### Entités nommées |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Paris est située sur la Seine. <La plus grande ville de France> compte plus de 10 millions d’habitants." |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Paris'] |
|
``` |
|
### Les groupes |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Jack et Rose commencent à faire connaissance. Ils s’entendent bien. <Le couple> se marie et a des enfants." |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁Jack', '▁et', '▁Rose', '▁Ils'] |
|
``` |
|
|
|
### Groupes dispersés |
|
```python |
|
from transformers import pipeline |
|
classifier = pipeline("ner", model=model, tokenizer=tokenizer) |
|
|
|
text = "Pierre retrouva sa femme au restaurant. <Le couple> dina jusqu'à tard dans la nuit." |
|
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] |
|
# results |
|
['▁sa', '▁femme'] # ici il y a une erreur, on devrait avoir "Pierre" également |
|
``` |
|
|
|
|
|
## Risks, Limitations and Biases |
|
|
|
|
|
|
|
## Training |
|
|
|
|
|
#### Training Data |
|
|
|
|
|
|
|
#### Training Procedure |
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
|
|
## Citation Information |
|
|
|
|
|
## How to Get Started With the Model |
|
|
|
|