--- datasets: - gguichard/coref_dataset language: - fr library_name: transformers widget: - text: "Un homme, qui parle à son collègue, avance vers moi." --- # Easter-Island/coref_classifier_ancor ## Table of Contents - [Model Details](#model-details) - [Uses](#uses) - [Risks, Limitations and Biases](#risks-limitations-and-biases) - [Training](#training) - [Evaluation](#evaluation) - [Citation Information](#citation-information) - [How to Get Started With the Model](#how-to-get-started-with-the-model) - ## Model Details - **Model Description:** This model is a state-of-the-art language model for French coreference resolution. - **Developed by:** Grégory Guichard - **Model Type:** Token Classification - **Language(s):** French - **License:** MIT - **Parent Model:** See the [Camembert-large model](https://huggingface.co/camembert/camembert-large) for more information about the RoBERTa base model. - **Resources for more information:** ## Uses This model can be used for Coreference token classification tasks. The model evaluates, for each token, if it is a reference of the expression between "<>". ### Example ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Un homme me parle. est beau." [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Un', '▁homme'] ``` This coreference resolver can perform many tasks ### Reprise pronominale ```python from transformers import AutoModelForTokenClassification, AutoTokenizer model = AutoModelForTokenClassification.from_pretrained("models/merged/ancor_classifier") tokenizer = AutoTokenizer.from_pretrained("models/merged/ancor_classifier_tokenizer") from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Platon est un philosophe antique de la Grèce classique... Il reprit le travail philosophique decertains de prédécesseurs" [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Platon'] text = "Platon est un philosophe antique de la Grèce classique... reprit le travail philosophique decertains de ses prédécesseurs" [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Platon', '▁un', '▁philosophe', '▁antique', '▁de','▁la', '▁Grèce', '▁classique', '▁ses'] ``` ### Anaphores fidèles ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Le chat que j’ai adopté court partout... Mais j’aime beaucoup ." [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Le', '▁chat'] ``` ### Anaphores infidèles ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Le chat que j’ai adopté court partout... Mais j’aime beaucoup ." [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Le', '▁chat'] ``` ### Paroles rapportées ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = """Lionel Jospin se livre en revanche à une longue analyse de son échec du 21 avril. “Ma part de responsabilité dans l’échec existe forcément. l’ai assumée en quittant la vie politique”""" [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Lionel', '▁Jos', 'pin', '▁son', 'Ma'] ``` ### Entités nommées ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Paris est située sur la Seine. compte plus de 10 millions d’habitants." [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Paris'] ``` ### Les groupes ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Jack et Rose commencent à faire connaissance. Ils s’entendent bien. se marie et a des enfants." [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁Jack', '▁et', '▁Rose', '▁Ils'] ``` ### Groupes dispersés ```python from transformers import pipeline classifier = pipeline("ner", model=model, tokenizer=tokenizer) text = "Pierre retrouva sa femme au restaurant. dina jusqu'à tard dans la nuit." [elem['word'] for elem in classifier(text) if elem['entity'] == 'LABEL_1'] # results ['▁sa', '▁femme'] # ici il y a une erreur, on devrait avoir "Pierre" également ``` ## Risks, Limitations and Biases ## Training #### Training Data #### Training Procedure ## Evaluation ## Citation Information ## How to Get Started With the Model