--- license: mit base_model: camembert/camembert-large metrics: - precision - recall - f1 - accuracy model-index: - name: NERmembert-large-3entities results: [] datasets: - CATIE-AQ/frenchNER_3entities language: - fr widget: - text: >- Le dévoilement du logo officiel des JO s'est déroulé le 21 octobre 2019 au Grand Rex. Ce nouvel emblème et cette nouvelle typographie ont été conçus par le designer Sylvain Boyer avec les agences Royalties & Ecobranding. Rond, il rassemble trois symboles : une médaille d'or, la flamme olympique et Marianne, symbolisée par un visage de femme mais privée de son bonnet phrygien caractéristique. La typographie dessinée fait référence à l'Art déco, mouvement artistique des années 1920, décennie pendant laquelle ont eu lieu pour la dernière fois les Jeux olympiques à Paris en 1924. Pour la première fois, ce logo sera unique pour les Jeux olympiques et les Jeux paralympiques. library_name: transformers pipeline_tag: token-classification co2_eq_emissions: 90 new_version: CATIE-AQ/NERmemberta-3entities --- # NERmembert-large-3entities ## Model Description We present **NERmembert-large-3entities**, which is a [CamemBERT large](https://huggingface.co/camembert/camembert-large) fine-tuned for the Name Entity Recognition task for the French language on five French NER datasets for 3 entities (LOC, PER, ORG). All these datasets were concatenated and cleaned into a single dataset that we called [frenchNER_3entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities). This represents a total of over **420,264 rows, of which 346,071 are for training, 32,951 for validation and 41,242 for testing.** Our methodology is described in a blog post available in [English](https://blog.vaniila.ai/en/NER_en/) or [French](https://blog.vaniila.ai/NER/). ## Dataset The dataset used is [frenchNER_3entities](https://huggingface.co/datasets/CATIE-AQ/frenchNER_3entities), which represents ~420k sentences labeled in 4 categories: | Label | Examples | |:------|:-----------------------------------------------------------| | PER | "La Bruyère", "Gaspard de Coligny", "Wittgenstein" | | ORG | "UTBM", "American Airlines", "id Software" | | LOC | "République du Cap-Vert", "Créteil", "Bordeaux" | The distribution of the entities is as follows:
Splits |
O |
PER |
LOC |
ORG |
train |
8,398,765 |
327,393 |
303,722 |
151,490 |
---|---|---|---|---|
validation |
592,815 |
34,127 |
30,279 |
18,743 |
test |
773,871 |
43,634 |
39,195 |
21,391 |
Model |
PER |
LOC |
ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner |
0.941 |
0.883 |
0.658 |
cmarkea/distilcamembert-base-ner |
0.942 |
0.882 |
0.647 |
NERmembert-base-3entities |
0.966 |
0.940 |
0.876 |
NERmembert-large-3entities (this model) |
0.969 |
0.947 |
0.890 |
NERmembert-base-4entities |
0.951 |
0.894 |
0.671 |
NERmembert-large-4entities |
0.958 |
0.901 |
0.685 |
Model |
Metrics |
PER |
LOC |
ORG |
O |
Overall |
---|---|---|---|---|---|---|
Jean-Baptiste/camembert-ner |
Precision |
0.918 |
0.860 |
0.831 |
0.992 |
0.974 |
Recall |
0.964 |
0.908 |
0.544 |
0.964 |
0.948 |
|
F1 | 0.941 |
0.883 |
0.658 |
0.978 |
0.961 |
|
cmarkea/distilcamembert-base-ner |
Precision |
0.929 |
0.861 |
0.813 |
0.991 |
0.974 |
Recall |
0.956 |
0.905 |
0.956 |
0.965 |
0.948 |
|
F1 | 0.942 |
0.882 |
0.647 |
0.978 |
0.961 |
|
NERmembert-base-3entities |
Precision |
0.961 |
0.935 |
0.877 |
0.995 |
0.986 |
Recall |
0.972 |
0.946 |
0.876 |
0.994 |
0.986 |
|
F1 | 0.966 |
0.940 |
0.876 |
0.994 |
0.986 |
|
NERmembert-large-3entities (this model) |
Precision |
0.966 |
0.944 |
0.884 |
0.996 |
0.987 |
Recall |
0.950 |
0.972 |
0.896 |
0.994 |
0.987 |
|
F1 | 0.969 |
0.947 |
0.890 |
0.995 |
0.987 |
|
NERmembert-base-4entities |
Precision |
0.946 |
0.884 |
0.859 |
0.993 |
0.971 |
Recall |
0.955 |
0.904 |
0.550 |
0.993 |
0.971 |
|
F1 | 0.951 |
0.894 |
0.671 |
0.988 |
0.971 |
|
NERmembert-large-4entities |
Precision |
0.955 |
0.896 |
0.866 |
0.983 |
0.974 |
Recall |
0.960 |
0.906 |
0.567 |
0.994 |
0.974 |
|
F1 | 0.958 |
0.901 |
0.685 |
0.988 |
0.974 |
Model |
PER |
LOC |
ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner |
0.940 |
0.761 |
0.723 |
cmarkea/distilcamembert-base-ner |
0.921 |
0.748 |
0.694 |
NERmembert-base-3entities |
0.960 |
0.887 |
0.876 |
NERmembert-large-3entities (this model) |
0.965 |
0.902 |
0.896 |
NERmembert-base-4entities |
0.960 |
0.890 |
0.867 |
NERmembert-large-4entities |
0.969 |
0.919 |
0.904 |
Model |
Metrics |
PER |
LOC |
ORG |
O |
Overall |
---|---|---|---|---|---|---|
Jean-Baptiste/camembert-ner |
Precision |
0.908 |
0.717 |
0.753 |
0.987 |
0.947 |
Recall |
0.975 |
0.811 |
0.696 |
0.878 |
0.880 |
|
F1 | 0.940 |
0.761 |
0.723 |
0.929 |
0.912 |
|
cmarkea/distilcamembert-base-ner |
Precision |
0.885 |
0.738 |
0.737 |
0.983 |
0.943 |
Recall |
0.960 |
0.759 |
0.655 |
0.882 |
0.877 |
|
F1 | 0.921 |
0.748 |
0.694 |
0.930 |
0.909 |
|
NERmembert-base-3entities |
Precision |
0.957 |
0.894 |
0.876 |
0.986 |
0.972 |
Recall |
0.962 |
0.880 |
0.878 |
0.985 |
0.972 |
|
F1 | 0.960 |
0.887 |
0.876 |
0.985 |
0.972 |
|
NERmembert-large-3entities (this model) |
Precision |
0.960 |
0.903 |
0.916 |
0.987 |
0.976 |
Recall |
0.969 |
0.900 |
0.877 |
0.987 |
0.976 |
|
F1 | 0.965 |
0.902 |
0.896 |
0.987 |
0.976 |
|
NERmembert-base-4entities |
Precision |
0.954 |
0.893 |
0.851 |
0.988 |
0.972 |
Recall |
0.967 |
0.887 |
0.883 |
0.984 |
0.972 |
|
F1 | 0.960 |
0.890 |
0.867 |
0.986 |
0.972 |
|
NERmembert-large-4entities |
Precision |
0.964 |
0.922 |
0.904 |
0.990 |
0.978 |
Recall |
0.975 |
0.917 |
0.904 |
0.988 |
0.978 |
|
F1 | 0.969 |
0.919 |
0.904 |
0.989 |
0.978 |
Model |
PER |
LOC |
ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner |
0.962 |
0.934 |
0.888 |
cmarkea/distilcamembert-base-ner |
0.972 |
0.938 |
0.884 |
NERmembert-base-3entities |
0.985 |
0.973 |
0.938 |
NERmembert-large-3entities (this model) |
0.987 |
0.979 |
0.953 |
NERmembert-base-4entities |
0.985 |
0.973 |
0.938 |
NERmembert-large-4entities |
0.987 |
0.976 |
0.948 |
Model |
Metrics |
PER |
LOC |
ORG |
O |
Overall |
---|---|---|---|---|---|---|
Jean-Baptiste/camembert-ner |
Precision |
0.931 |
0.893 |
0.827 |
0.999 |
0.988 |
Recall |
0.994 |
0.980 |
0.959 |
0.973 |
0.974 |
|
F1 | 0.962 |
0.934 |
0.888 |
0.986 |
0.981 |
|
cmarkea/distilcamembert-base-ner |
Precision |
0.954 |
0.908 |
0.817 |
0.999 |
0.990 |
Recall |
0.991 |
0.969 |
0.963 |
0.975 |
0.975 |
|
F1 | 0.972 |
0.938 |
0.884 |
0.987 |
0.983 |
|
NERmembert-base-3entities |
Precision |
0.974 |
0.965 |
0.910 |
0.999 |
0.995 |
Recall |
0.995 |
0.981 |
0.968 |
0.996 |
0.995 |
|
F1 | 0.985 |
0.973 |
0.938 |
0.998 |
0.995 |
|
NERmembert-large-3entities (this model) |
Precision |
0.979 |
0.970 |
0.927 |
0.999 |
0.996 |
Recall |
0.996 |
0.987 |
0.980 |
0.997 |
0.996 |
|
F1 | 0.987 |
0.979 |
0.953 |
0.998 |
0.996 |
|
NERmembert-base-4entities |
Precision |
0.976 |
0.961 |
0.910 |
0.999 |
0.995 |
Recall |
0.994 |
0.985 |
0.967 |
0.996 |
0.995 |
|
F1 | 0.985 |
0.973 |
0.938 |
0.998 |
0.995 |
|
NERmembert-large-4entities |
Precision |
0.979 |
0.967 |
0.922 |
0.999 |
0.996 |
Recall |
0.996 |
0.986 |
0.974 |
0.974 |
0.996 |
|
F1 | 0.987 |
0.976 |
0.948 |
0.998 |
0.996 |
Model |
PER |
LOC |
ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner |
0.986 |
0.966 |
0.938 |
cmarkea/distilcamembert-base-ner |
0.983 |
0.964 |
0.925 |
NERmembert-base-3entities |
0.969 |
0.945 |
0.878 |
NERmembert-large-3entities (this model) |
0.972 |
0.950 |
0.893 |
NERmembert-base-4entities |
0.970 |
0.945 |
0.876 |
NERmembert-large-4entities |
0.975 |
0.953 |
0.896 |
Model |
Metrics |
PER |
LOC |
ORG |
O |
Overall |
---|---|---|---|---|---|---|
Jean-Baptiste/camembert-ner |
Precision |
0.986 |
0.962 |
0.925 |
0.999 |
0.994 |
Recall |
0.987 |
0.969 |
0.951 |
0.965 |
0.967 |
|
F1 | 0.986 |
0.966 |
0.938 |
0.982 |
0.980 |
|
cmarkea/distilcamembert-base-ner |
Precision |
0.982 |
0.951 |
0.910 |
0.998 |
0.994 |
Recall |
0.985 |
0.963 |
0.940 |
0.966 |
0.967 |
|
F1 | 0.983 |
0.964 |
0.925 |
0.982 |
0.80 |
|
NERmembert-base-3entities |
Precision |
0.971 |
0.947 |
0.866 |
0.994 |
0.989 |
Recall |
0.969 |
0.942 |
0.891 |
0.995 |
0.989 |
|
F1 | 0.969 |
0.945 |
0.878 |
0.995 |
0.989 |
|
NERmembert-large-3entities (this model) |
Precision |
0.973 |
0.953 |
0.873 |
0.996 |
0.990 |
Recall |
0.990 |
0.948 |
0.913 |
0.995 |
0.990 |
|
F1 | 0.972 |
0.950 |
0.893 |
0.996 |
0.990 |
|
NERmembert-base-4entities |
Precision |
0.970 |
0.944 |
0.872 |
0.955 |
0.988 |
Recall |
0.989 |
0.947 |
0.880 |
0.995 |
0.988 |
|
F1 | 0.970 |
0.945 |
0.876 |
0.995 |
0.988 |
|
NERmembert-large-4entities |
Precision |
0.975 |
0.957 |
0.872 |
0.996 |
0.991 |
Recall |
0.975 |
0.949 |
0.922 |
0.996 |
0.991 |
|
F1 | 0.975 |
0.953 |
0.896 |
0.996 |
0.991 |
Model |
PER |
LOC |
ORG |
---|---|---|---|
Jean-Baptiste/camembert-ner |
0.867 |
0.722 |
0.451 |
cmarkea/distilcamembert-base-ner |
0.862 |
0.722 |
0.451 |
NERmembert-base-3entities |
0.947 |
0.906 |
0.886 |
NERmembert-large-3entities (this model) |
0.949 |
0.912 |
0.899 |
NERmembert-base-4entities |
0.888 |
0.733 |
0.496 |
NERmembert-large-4entities |
0.905 |
0.741 |
0.511 |
Model |
Metrics |
PER |
LOC |
ORG |
O |
Overall |
---|---|---|---|---|---|---|
Jean-Baptiste/camembert-ner |
Precision |
0.862 |
0.700 |
0.864 |
0.867 |
0.832 |
Recall |
0.871 |
0.746 |
0.305 |
0.950 |
0.772 |
|
F1 | 0.867 |
0.722 |
0.451 |
0.867 |
0.801 |
|
cmarkea/distilcamembert-base-ner |
Precision |
0.862 |
0.700 |
0.864 |
0.867 |
0.832 |
Recall |
0.871 |
0.746 |
0.305 |
0.950 |
0.772 |
|
F1 | 0.867 |
0.722 |
0.451 |
0.907 |
0.800 |
|
NERmembert-base-3entities |
Precision |
0.948 |
0.900 |
0.893 |
0.979 |
0.942 |
Recall |
0.946 |
0.911 |
0.878 |
0.982 |
0.942 |
|
F1 | 0.947 |
0.906 |
0.886 |
0.980 |
0.942 |
|
NERmembert-large-3entities (this model) |
Precision |
0.958 |
0.917 |
0.897 |
0.980 |
0.948 |
Recall |
0.940 |
0.915 |
0.901 |
0.983 |
0.948 |
|
F1 | 0.949 |
0.912 |
0.899 |
0.983 |
0.948 |
|
NERmembert-base-4entities |
Precision |
0.895 |
0.727 |
0.903 |
0.766 |
0.794 |
Recall |
0.881 |
0.740 |
0.342 |
0.984 |
0.794 |
|
F1 | 0.888 |
0.733 |
0.496 |
0.861 |
0.794 |
|
NERmembert-large-4entities |
Precision |
0.922 |
0.738 |
0.923 |
0.766 |
0.802 |
Recall |
0.888 |
0.743 |
0.353 |
0.988 |
0.802 |
|
F1 | 0.905 |
0.741 |
0.511 |
0.863 |
0.802 |