gguichard's picture
Update README.md
fdfbda2 verified
---
datasets:
- gguichard/coref_dataset
language:
- fr
library_name: transformers
widget:
- text: "Un homme, qui parle à son collègue, <s'> avance vers moi."
---
# Easter-Island/coref_classifier_ancor
## Table of Contents
- [Model Details](#model-details)
- [Uses](#uses)
- [Risks, Limitations and Biases](#risks-limitations-and-biases)
- [Training](#training)
- [Evaluation](#evaluation)
- [Citation Information](#citation-information)
- [How to Get Started With the Model](#how-to-get-started-with-the-model)
- ## Model Details
- **Model Description:**
This model is a state-of-the-art language model for French coreference resolution.
- **Developed by:** Grégory Guichard
- **Model Type:** Token Classification
- **Language(s):** French
- **License:** MIT
- **Parent Model:** See the [Camembert-large model](https://huggingface.co/camembert/camembert-large) for more information about the RoBERTa base model.
- **Resources for more information:**
## Uses
This model can be used for Coreference token classification tasks.
The model evaluates, for each token, if it is a reference of the expression between "<>".
### Example
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Un homme me parle. <Il> est beau."
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Un', '▁homme']
```
This coreference resolver can perform many tasks
### Reprise pronominale
```python
from transformers import AutoModelForTokenClassification, AutoTokenizer
model = AutoModelForTokenClassification.from_pretrained("models/merged/ancor_classifier")
tokenizer = AutoTokenizer.from_pretrained("models/merged/ancor_classifier_tokenizer")
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Platon est un philosophe antique de la Grèce classique... Il reprit le travail philosophique decertains de <ses> prédécesseurs"
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Platon']
text = "Platon est un philosophe antique de la Grèce classique... <Il> reprit le travail philosophique decertains de ses prédécesseurs"
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Platon', '▁un', '▁philosophe', '▁antique', '▁de','▁la', '▁Grèce', '▁classique', '▁ses']
```
### Anaphores fidèles
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Le chat que j’ai adopté court partout... Mais j’aime beaucoup <ce chat> ."
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Le', '▁chat']
```
### Anaphores infidèles
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Le chat que j’ai adopté court partout... Mais j’aime beaucoup <cet animal> ."
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Le', '▁chat']
```
### Paroles rapportées
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = """Lionel Jospin se livre en revanche à une longue analyse de son échec du 21 avril. “Ma part de responsabilité dans l’échec existe forcément. <Je> l’ai assumée en quittant la vie politique”"""
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Lionel', '▁Jos', 'pin', '▁son', 'Ma']
```
### Entités nommées
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Paris est située sur la Seine. <La plus grande ville de France> compte plus de 10 millions d’habitants."
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Paris']
```
### Les groupes
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Jack et Rose commencent à faire connaissance. Ils s’entendent bien. <Le couple> se marie et a des enfants."
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁Jack', '▁et', '▁Rose', '▁Ils']
```
### Groupes dispersés
```python
from transformers import pipeline
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Pierre retrouva sa femme au restaurant. <Le couple> dina jusqu'à tard dans la nuit."
[elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1']
# results
['▁sa', '▁femme'] # ici il y a une erreur, on devrait avoir "Pierre" également
```
## Risks, Limitations and Biases
## Training
#### Training Data
#### Training Procedure
## Evaluation
## Citation Information
## How to Get Started With the Model