|
--- |
|
license: apache-2.0 |
|
base_model: distilbert-base-multilingual-cased |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- precision |
|
- recall |
|
- f1 |
|
- accuracy |
|
model-index: |
|
- name: distilbert-base-multilingual-cased-pii |
|
results: [] |
|
datasets: |
|
- ai4privacy/pii-masking-300k |
|
pipeline_tag: token-classification |
|
|
|
widget: |
|
- text: "My name is Yoni Go and I live in Israel. My phone number is 054-1234567" |
|
|
|
inference: |
|
parameters: |
|
aggregation_strategy: "first" |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
Usage: |
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("token-classification", model="yonigo/distilbert-base-multilingual-cased-pii", aggregation_strategy="first") |
|
pipe("My name is Yoni Go and I live in Israel. My phone number is 054-1234567") |
|
``` |
|
|
|
training code [git](https://github.com/yonigottesman/pii-model) |
|
|
|
# distilbert-base-multilingual-cased-pii |
|
|
|
This model is a fine-tuned version of [distilbert-base-multilingual-cased](https://huggingface.co/distilbert-base-multilingual-cased) on [ai4privacy/pii-masking-300k](https://huggingface.co/datasets/ai4privacy/pii-masking-300k).. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.0470 |
|
- Bod F1: 0.9642 |
|
- Building F1: 0.9789 |
|
- Cardissuer F1: 0.9697 |
|
- City F1: 0.9566 |
|
- Country F1: 0.9737 |
|
- Date F1: 0.9264 |
|
- Driverlicense F1: 0.9633 |
|
- Email F1: 0.9833 |
|
- Geocoord F1: 0.9654 |
|
- Givenname1 F1: 0.8653 |
|
- Givenname2 F1: 0.8170 |
|
- Idcard F1: 0.9390 |
|
- Ip F1: 0.9842 |
|
- Lastname1 F1: 0.8495 |
|
- Lastname2 F1: 0.7609 |
|
- Lastname3 F1: 0.7281 |
|
- Pass F1: 0.9247 |
|
- Passport F1: 0.9540 |
|
- Postcode F1: 0.9808 |
|
- Secaddress F1: 0.9732 |
|
- Sex F1: 0.9700 |
|
- Socialnumber F1: 0.9689 |
|
- State F1: 0.9761 |
|
- Street F1: 0.9609 |
|
- Tel F1: 0.9777 |
|
- Time F1: 0.9701 |
|
- Title F1: 0.9572 |
|
- Username F1: 0.9594 |
|
- Precision: 0.9428 |
|
- Recall: 0.9582 |
|
- F1: 0.9504 |
|
- Accuracy: 0.9909 |
|
|
|
|
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Bod F1 | Building F1 | Cardissuer F1 | City F1 | Country F1 | Date F1 | Driverlicense F1 | Email F1 | Geocoord F1 | Givenname1 F1 | Givenname2 F1 | Idcard F1 | Ip F1 | Lastname1 F1 | Lastname2 F1 | Lastname3 F1 | Pass F1 | Passport F1 | Postcode F1 | Secaddress F1 | Sex F1 | Socialnumber F1 | State F1 | Street F1 | Tel F1 | Time F1 | Title F1 | Username F1 | Precision | Recall | F1 | Accuracy | |
|
|:-------------:|:-------:|:-----:|:---------------:|:------:|:-----------:|:-------------:|:-------:|:----------:|:-------:|:----------------:|:--------:|:-----------:|:-------------:|:-------------:|:---------:|:------:|:------------:|:------------:|:------------:|:-------:|:-----------:|:-----------:|:-------------:|:------:|:---------------:|:--------:|:---------:|:------:|:-------:|:--------:|:-----------:|:---------:|:------:|:------:|:--------:| |
|
| 0.2604 | 0.3601 | 1000 | 0.1439 | 0.8486 | 0.8928 | 0.0 | 0.6347 | 0.7409 | 0.6650 | 0.4865 | 0.9454 | 0.8685 | 0.4884 | 0.0 | 0.4298 | 0.9051 | 0.4869 | 0.0 | 0.0 | 0.6948 | 0.5073 | 0.7842 | 0.4352 | 0.6765 | 0.7223 | 0.7680 | 0.6802 | 0.8438 | 0.9211 | 0.5403 | 0.8180 | 0.6715 | 0.7248 | 0.6971 | 0.9663 | |
|
| 0.0866 | 0.7202 | 2000 | 0.0707 | 0.9385 | 0.9611 | 0.0 | 0.9027 | 0.9564 | 0.8655 | 0.8200 | 0.9750 | 0.9546 | 0.7057 | 0.2081 | 0.8231 | 0.9689 | 0.6300 | 0.1133 | 0.0 | 0.8483 | 0.8467 | 0.9453 | 0.9564 | 0.9319 | 0.8831 | 0.9450 | 0.9101 | 0.9487 | 0.9529 | 0.8716 | 0.9285 | 0.8700 | 0.8839 | 0.8769 | 0.9839 | |
|
| 0.0659 | 1.0803 | 3000 | 0.0554 | 0.9507 | 0.9705 | 0.0 | 0.9241 | 0.9644 | 0.8952 | 0.8736 | 0.9792 | 0.9280 | 0.8046 | 0.6345 | 0.8698 | 0.9748 | 0.7571 | 0.5305 | 0.0 | 0.8533 | 0.8883 | 0.9659 | 0.9678 | 0.9571 | 0.9209 | 0.9615 | 0.9303 | 0.9617 | 0.9630 | 0.9145 | 0.9455 | 0.9014 | 0.9216 | 0.9114 | 0.9868 | |
|
| 0.0523 | 1.4404 | 4000 | 0.0484 | 0.9553 | 0.9766 | 0.0 | 0.9358 | 0.9677 | 0.9017 | 0.8924 | 0.9758 | 0.9645 | 0.8305 | 0.7005 | 0.8966 | 0.9765 | 0.7978 | 0.5920 | 0.0 | 0.8963 | 0.9195 | 0.9741 | 0.9688 | 0.9644 | 0.9266 | 0.9696 | 0.9421 | 0.9706 | 0.9656 | 0.9301 | 0.9520 | 0.9183 | 0.9325 | 0.9253 | 0.9884 | |
|
| 0.0465 | 1.8005 | 5000 | 0.0467 | 0.9576 | 0.9759 | 0.0 | 0.9400 | 0.9701 | 0.9138 | 0.9209 | 0.9837 | 0.9568 | 0.8423 | 0.7384 | 0.9088 | 0.9835 | 0.8042 | 0.6235 | 0.2139 | 0.8985 | 0.9308 | 0.9711 | 0.9673 | 0.9649 | 0.9450 | 0.9714 | 0.9471 | 0.9708 | 0.9672 | 0.9447 | 0.9532 | 0.9206 | 0.9445 | 0.9324 | 0.9890 | |
|
| 0.0401 | 2.1606 | 6000 | 0.0441 | 0.9629 | 0.9755 | 0.0 | 0.9486 | 0.9700 | 0.9154 | 0.9288 | 0.9809 | 0.9619 | 0.8485 | 0.7652 | 0.9180 | 0.9826 | 0.8231 | 0.6677 | 0.4724 | 0.8883 | 0.9343 | 0.9777 | 0.9734 | 0.9685 | 0.9490 | 0.9733 | 0.9529 | 0.9743 | 0.9672 | 0.9482 | 0.9555 | 0.9300 | 0.9454 | 0.9377 | 0.9895 | |
|
| 0.0401 | 2.5207 | 7000 | 0.0428 | 0.9619 | 0.9769 | 0.0 | 0.9492 | 0.9709 | 0.9206 | 0.9401 | 0.9795 | 0.9615 | 0.8550 | 0.7776 | 0.9274 | 0.9827 | 0.8267 | 0.6742 | 0.5845 | 0.9085 | 0.9427 | 0.9798 | 0.9755 | 0.9690 | 0.9515 | 0.9736 | 0.9557 | 0.9764 | 0.9700 | 0.9479 | 0.9580 | 0.9340 | 0.9491 | 0.9415 | 0.9900 | |
|
| 0.0394 | 2.8808 | 8000 | 0.0420 | 0.9616 | 0.9770 | 0.0 | 0.9481 | 0.9730 | 0.9185 | 0.9451 | 0.9832 | 0.9569 | 0.8526 | 0.7895 | 0.9269 | 0.9852 | 0.8312 | 0.7121 | 0.6234 | 0.9168 | 0.9441 | 0.9778 | 0.9737 | 0.9700 | 0.9514 | 0.9738 | 0.9565 | 0.9751 | 0.9674 | 0.9512 | 0.9562 | 0.9324 | 0.9535 | 0.9429 | 0.9901 | |
|
| 0.0323 | 3.2409 | 9000 | 0.0422 | 0.9575 | 0.9781 | 0.0 | 0.9521 | 0.9725 | 0.9215 | 0.9445 | 0.9787 | 0.9601 | 0.8459 | 0.7863 | 0.9238 | 0.9834 | 0.8189 | 0.7040 | 0.6460 | 0.9117 | 0.9393 | 0.9792 | 0.9748 | 0.9679 | 0.9575 | 0.9746 | 0.9569 | 0.9732 | 0.9688 | 0.9509 | 0.9557 | 0.9336 | 0.9500 | 0.9418 | 0.9899 | |
|
| 0.0313 | 3.6010 | 10000 | 0.0412 | 0.9630 | 0.9784 | 0.0 | 0.9551 | 0.9741 | 0.9235 | 0.9460 | 0.9826 | 0.9646 | 0.8619 | 0.7991 | 0.9277 | 0.9829 | 0.8386 | 0.7306 | 0.6767 | 0.9199 | 0.9454 | 0.9810 | 0.9746 | 0.9692 | 0.9598 | 0.9746 | 0.9589 | 0.9731 | 0.9685 | 0.9547 | 0.9583 | 0.9390 | 0.9527 | 0.9458 | 0.9904 | |
|
| 0.0304 | 3.9611 | 11000 | 0.0404 | 0.9587 | 0.9792 | 0.1333 | 0.9511 | 0.9725 | 0.9219 | 0.9538 | 0.9769 | 0.9578 | 0.8589 | 0.8061 | 0.9255 | 0.9845 | 0.8402 | 0.7395 | 0.6790 | 0.9136 | 0.9479 | 0.9801 | 0.9748 | 0.9698 | 0.9628 | 0.9752 | 0.9581 | 0.9775 | 0.9695 | 0.9501 | 0.9597 | 0.9373 | 0.9540 | 0.9456 | 0.9904 | |
|
| 0.0264 | 4.3212 | 12000 | 0.0416 | 0.9599 | 0.9794 | 0.5 | 0.9547 | 0.9735 | 0.9271 | 0.9557 | 0.9809 | 0.9537 | 0.8510 | 0.8016 | 0.9316 | 0.9816 | 0.8358 | 0.7412 | 0.6877 | 0.9212 | 0.9476 | 0.9779 | 0.9729 | 0.9682 | 0.9611 | 0.9748 | 0.9593 | 0.9742 | 0.9697 | 0.9551 | 0.9590 | 0.9370 | 0.9550 | 0.9459 | 0.9904 | |
|
| 0.0266 | 4.6813 | 13000 | 0.0412 | 0.9629 | 0.9800 | 0.5 | 0.9511 | 0.9697 | 0.9276 | 0.9564 | 0.9826 | 0.9578 | 0.8590 | 0.8078 | 0.9303 | 0.9830 | 0.8423 | 0.7470 | 0.6945 | 0.9162 | 0.9468 | 0.9789 | 0.9713 | 0.9692 | 0.9597 | 0.9748 | 0.9584 | 0.9759 | 0.9698 | 0.9555 | 0.9575 | 0.9355 | 0.9579 | 0.9466 | 0.9905 | |
|
| 0.0236 | 5.0414 | 14000 | 0.0414 | 0.9614 | 0.9786 | 0.6061 | 0.9562 | 0.9736 | 0.9223 | 0.9595 | 0.9821 | 0.9537 | 0.8673 | 0.8108 | 0.9367 | 0.9811 | 0.8422 | 0.7523 | 0.7140 | 0.9190 | 0.9503 | 0.9807 | 0.9679 | 0.9689 | 0.9676 | 0.9750 | 0.9611 | 0.9758 | 0.9699 | 0.9556 | 0.9589 | 0.9426 | 0.9543 | 0.9484 | 0.9907 | |
|
| 0.0221 | 5.4015 | 15000 | 0.0420 | 0.9597 | 0.9797 | 0.6667 | 0.9554 | 0.9734 | 0.9210 | 0.9587 | 0.9832 | 0.9667 | 0.8637 | 0.8121 | 0.9367 | 0.9852 | 0.8449 | 0.7509 | 0.7145 | 0.9178 | 0.9498 | 0.9808 | 0.9746 | 0.9707 | 0.9650 | 0.9746 | 0.9604 | 0.9749 | 0.9692 | 0.9556 | 0.9591 | 0.9405 | 0.9563 | 0.9484 | 0.9906 | |
|
| 0.021 | 5.7616 | 16000 | 0.0421 | 0.9613 | 0.9794 | 0.6667 | 0.9532 | 0.9736 | 0.9287 | 0.9554 | 0.9792 | 0.9599 | 0.8624 | 0.8146 | 0.9334 | 0.9790 | 0.8445 | 0.7534 | 0.7154 | 0.9181 | 0.9487 | 0.9791 | 0.9721 | 0.9691 | 0.9646 | 0.9748 | 0.9534 | 0.9757 | 0.9693 | 0.9561 | 0.9586 | 0.9403 | 0.9545 | 0.9473 | 0.9905 | |
|
| 0.0174 | 6.1217 | 17000 | 0.0433 | 0.9617 | 0.9788 | 0.7879 | 0.9545 | 0.9738 | 0.9241 | 0.9598 | 0.9829 | 0.9589 | 0.8570 | 0.8131 | 0.9369 | 0.9838 | 0.8449 | 0.7581 | 0.7242 | 0.9230 | 0.9488 | 0.9798 | 0.9690 | 0.9691 | 0.9652 | 0.9759 | 0.9563 | 0.9769 | 0.9700 | 0.9556 | 0.9581 | 0.9403 | 0.9563 | 0.9482 | 0.9907 | |
|
| 0.017 | 6.4818 | 18000 | 0.0442 | 0.9623 | 0.9790 | 0.9697 | 0.9566 | 0.9744 | 0.9258 | 0.9608 | 0.9833 | 0.9574 | 0.8565 | 0.8130 | 0.9350 | 0.9845 | 0.8450 | 0.7552 | 0.7329 | 0.9216 | 0.9519 | 0.9800 | 0.9723 | 0.9703 | 0.9675 | 0.9762 | 0.9605 | 0.9775 | 0.9713 | 0.9545 | 0.9582 | 0.9398 | 0.9582 | 0.9489 | 0.9907 | |
|
| 0.017 | 6.8419 | 19000 | 0.0431 | 0.9639 | 0.9778 | 0.9697 | 0.9562 | 0.9738 | 0.9286 | 0.9612 | 0.9842 | 0.9607 | 0.8641 | 0.8160 | 0.9363 | 0.9828 | 0.8481 | 0.7610 | 0.7292 | 0.9198 | 0.9531 | 0.9800 | 0.9757 | 0.9699 | 0.9657 | 0.9751 | 0.9600 | 0.9767 | 0.9705 | 0.9565 | 0.9587 | 0.9414 | 0.9577 | 0.9495 | 0.9909 | |
|
| 0.015 | 7.2020 | 20000 | 0.0438 | 0.9645 | 0.9795 | 0.9091 | 0.9550 | 0.9734 | 0.9295 | 0.9605 | 0.9824 | 0.9605 | 0.8594 | 0.8120 | 0.9382 | 0.9837 | 0.8452 | 0.7571 | 0.7222 | 0.9220 | 0.9540 | 0.9810 | 0.9745 | 0.9700 | 0.9672 | 0.9758 | 0.9599 | 0.9783 | 0.9702 | 0.9551 | 0.9596 | 0.9414 | 0.9576 | 0.9494 | 0.9908 | |
|
| 0.0152 | 7.5621 | 21000 | 0.0451 | 0.9644 | 0.9795 | 0.9697 | 0.9570 | 0.9741 | 0.9271 | 0.9616 | 0.9826 | 0.9597 | 0.8649 | 0.8121 | 0.9374 | 0.9848 | 0.8469 | 0.7612 | 0.7261 | 0.9231 | 0.9530 | 0.9809 | 0.9747 | 0.9704 | 0.9661 | 0.9756 | 0.9618 | 0.9769 | 0.9706 | 0.9570 | 0.9601 | 0.9427 | 0.9573 | 0.9499 | 0.9908 | |
|
| 0.0137 | 7.9222 | 22000 | 0.0450 | 0.9628 | 0.9780 | 0.9697 | 0.9565 | 0.9742 | 0.9289 | 0.9627 | 0.9832 | 0.9613 | 0.8643 | 0.8169 | 0.9374 | 0.9840 | 0.8497 | 0.7632 | 0.7292 | 0.9234 | 0.9514 | 0.9807 | 0.9737 | 0.9695 | 0.9674 | 0.9758 | 0.9610 | 0.9778 | 0.9701 | 0.9572 | 0.9596 | 0.9420 | 0.9582 | 0.9501 | 0.9908 | |
|
| 0.0122 | 8.2823 | 23000 | 0.0463 | 0.9646 | 0.9789 | 0.9697 | 0.9560 | 0.9738 | 0.9276 | 0.9628 | 0.9835 | 0.9602 | 0.8643 | 0.8176 | 0.9386 | 0.9838 | 0.8494 | 0.7638 | 0.7275 | 0.9233 | 0.9519 | 0.9806 | 0.9739 | 0.9696 | 0.9682 | 0.9762 | 0.9604 | 0.9769 | 0.9698 | 0.9577 | 0.9592 | 0.9426 | 0.9578 | 0.9502 | 0.9908 | |
|
| 0.0123 | 8.6424 | 24000 | 0.0459 | 0.9626 | 0.9782 | 0.9697 | 0.9566 | 0.9743 | 0.9276 | 0.9628 | 0.9839 | 0.9613 | 0.8670 | 0.8163 | 0.9394 | 0.9850 | 0.8487 | 0.7635 | 0.7357 | 0.9241 | 0.9539 | 0.9810 | 0.9737 | 0.9701 | 0.9680 | 0.9757 | 0.9617 | 0.9780 | 0.9702 | 0.9574 | 0.9601 | 0.9436 | 0.9578 | 0.9506 | 0.9909 | |
|
| 0.0133 | 9.0025 | 25000 | 0.0462 | 0.9636 | 0.9788 | 0.9697 | 0.9563 | 0.9731 | 0.9273 | 0.9631 | 0.9835 | 0.9625 | 0.8672 | 0.8157 | 0.9393 | 0.9837 | 0.8495 | 0.7609 | 0.7289 | 0.9236 | 0.9541 | 0.9814 | 0.9737 | 0.9698 | 0.9684 | 0.9761 | 0.9618 | 0.9776 | 0.9698 | 0.9570 | 0.9591 | 0.9435 | 0.9574 | 0.9504 | 0.9909 | |
|
| 0.0112 | 9.3626 | 26000 | 0.0467 | 0.9624 | 0.9789 | 0.9697 | 0.9567 | 0.9740 | 0.9243 | 0.9635 | 0.9832 | 0.9654 | 0.8643 | 0.8170 | 0.9375 | 0.9844 | 0.8489 | 0.7603 | 0.7303 | 0.9248 | 0.9534 | 0.9812 | 0.9735 | 0.9701 | 0.9685 | 0.9762 | 0.9617 | 0.9784 | 0.9698 | 0.9563 | 0.9594 | 0.9428 | 0.9576 | 0.9501 | 0.9909 | |
|
| 0.0116 | 9.7227 | 27000 | 0.0464 | 0.9628 | 0.9789 | 0.9697 | 0.9562 | 0.9741 | 0.9260 | 0.9633 | 0.9826 | 0.9643 | 0.8637 | 0.8138 | 0.9379 | 0.9843 | 0.8492 | 0.7610 | 0.7278 | 0.9245 | 0.9536 | 0.9808 | 0.9725 | 0.9702 | 0.9686 | 0.9761 | 0.9613 | 0.9778 | 0.9698 | 0.9564 | 0.9591 | 0.9419 | 0.9583 | 0.9500 | 0.9908 | |
|
| 0.011 | 10.0828 | 28000 | 0.0470 | 0.9637 | 0.9790 | 0.9697 | 0.9561 | 0.9736 | 0.9266 | 0.9632 | 0.9831 | 0.9646 | 0.8656 | 0.8160 | 0.9384 | 0.9843 | 0.8494 | 0.7597 | 0.7281 | 0.9239 | 0.9537 | 0.9805 | 0.9731 | 0.9701 | 0.9685 | 0.9759 | 0.9611 | 0.9778 | 0.9698 | 0.9573 | 0.9591 | 0.9423 | 0.9583 | 0.9502 | 0.9909 | |
|
| 0.011 | 10.4429 | 29000 | 0.0469 | 0.9642 | 0.9790 | 0.9697 | 0.9567 | 0.9738 | 0.9267 | 0.9632 | 0.9834 | 0.9654 | 0.8653 | 0.8172 | 0.9393 | 0.9842 | 0.8495 | 0.7609 | 0.7287 | 0.9247 | 0.9544 | 0.9809 | 0.9732 | 0.9699 | 0.9687 | 0.9762 | 0.9614 | 0.9777 | 0.9699 | 0.9574 | 0.9596 | 0.9430 | 0.9581 | 0.9505 | 0.9909 | |
|
| 0.0106 | 10.8030 | 30000 | 0.0470 | 0.9642 | 0.9789 | 0.9697 | 0.9566 | 0.9737 | 0.9264 | 0.9633 | 0.9833 | 0.9654 | 0.8653 | 0.8170 | 0.9390 | 0.9842 | 0.8495 | 0.7609 | 0.7281 | 0.9247 | 0.9540 | 0.9808 | 0.9732 | 0.9700 | 0.9689 | 0.9761 | 0.9609 | 0.9777 | 0.9701 | 0.9572 | 0.9594 | 0.9428 | 0.9582 | 0.9504 | 0.9909 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.41.2 |
|
- Pytorch 2.3.1+cu121 |
|
- Datasets 2.20.0 |
|
- Tokenizers 0.19.1 |