Isotonic commited on
Commit
0865abd
·
verified ·
1 Parent(s): 7233ec6
Files changed (1) hide show
  1. README.md +26 -65
README.md CHANGED
@@ -1,11 +1,24 @@
1
  ---
2
- license: mit
3
  base_model: microsoft/mdeberta-v3-base
4
- tags:
5
- - generated_from_trainer
6
  model-index:
7
  - name: mdeberta-v3-base_finetuned_ai4privacy_v2
8
  results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -13,69 +26,17 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # mdeberta-v3-base_finetuned_ai4privacy_v2
15
 
16
- This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the None dataset.
17
  It achieves the following results on the evaluation set:
 
18
  - Loss: 0.0323
19
  - Overall Precision: 0.9636
20
  - Overall Recall: 0.9731
21
  - Overall F1: 0.9683
22
  - Overall Accuracy: 0.9896
23
- - Accountname F1: 0.9998
24
- - Accountnumber F1: 0.9973
25
- - Age F1: 0.9878
26
- - Amount F1: 0.9495
27
- - Bic F1: 0.9932
28
- - Bitcoinaddress F1: 0.9704
29
- - Buildingnumber F1: 0.9648
30
- - City F1: 0.9887
31
- - Companyname F1: 0.9942
32
- - County F1: 0.9940
33
- - Creditcardcvv F1: 0.9820
34
- - Creditcardissuer F1: 0.9985
35
- - Creditcardnumber F1: 0.9570
36
- - Currency F1: 0.8750
37
- - Currencycode F1: 0.9888
38
- - Currencyname F1: 0.7416
39
- - Currencysymbol F1: 0.9819
40
- - Date F1: 0.9295
41
- - Dob F1: 0.8946
42
- - Email F1: 0.9998
43
- - Ethereumaddress F1: 0.9965
44
- - Eyecolor F1: 0.9984
45
- - Firstname F1: 0.9886
46
- - Gender F1: 0.9962
47
- - Height F1: 1.0
48
- - Iban F1: 0.9966
49
- - Ip F1: 0.6284
50
- - Ipv4 F1: 0.8884
51
- - Ipv6 F1: 0.8015
52
- - Jobarea F1: 0.9940
53
- - Jobtitle F1: 0.9973
54
- - Jobtype F1: 0.9970
55
- - Lastname F1: 0.9653
56
- - Litecoinaddress F1: 0.9109
57
- - Mac F1: 0.9992
58
- - Maskednumber F1: 0.9524
59
- - Middlename F1: 0.9347
60
- - Nearbygpscoordinate F1: 1.0
61
- - Ordinaldirection F1: 0.9984
62
- - Password F1: 0.9936
63
- - Phoneimei F1: 0.9998
64
- - Phonenumber F1: 0.9992
65
- - Pin F1: 0.9857
66
- - Prefix F1: 0.9801
67
- - Secondaryaddress F1: 0.9988
68
- - Sex F1: 0.9979
69
- - Ssn F1: 0.9983
70
- - State F1: 0.9944
71
- - Street F1: 0.9953
72
- - Time F1: 0.9974
73
- - Url F1: 1.0
74
- - Useragent F1: 1.0
75
- - Username F1: 0.9966
76
- - Vehiclevin F1: 0.9936
77
- - Vehiclevrm F1: 0.9917
78
- - Zipcode F1: 0.9727
79
 
80
  ## Model description
81
 
@@ -95,11 +56,11 @@ More information needed
95
 
96
  The following hyperparameters were used during training:
97
  - learning_rate: 5e-05
98
- - train_batch_size: 16
99
- - eval_batch_size: 16
100
  - seed: 42
101
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
102
- - lr_scheduler_type: linear
103
  - lr_scheduler_warmup_ratio: 0.2
104
  - num_epochs: 5
105
 
@@ -119,4 +80,4 @@ The following hyperparameters were used during training:
119
  - Transformers 4.35.2
120
  - Pytorch 2.1.0+cu121
121
  - Datasets 2.16.1
122
- - Tokenizers 0.15.0
 
1
  ---
 
2
  base_model: microsoft/mdeberta-v3-base
 
 
3
  model-index:
4
  - name: mdeberta-v3-base_finetuned_ai4privacy_v2
5
  results: []
6
+ datasets:
7
+ - ai4privacy/pii-masking-200k
8
+ - Isotonic/pii-masking-200k
9
+ language:
10
+ - en
11
+ - de
12
+ - fr
13
+ - it
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ - precision
18
+ - recall
19
+ library_name: transformers
20
+ pipeline_tag: token-classification
21
+ license: cc-by-nc-4.0
22
  ---
23
 
24
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
26
 
27
  # mdeberta-v3-base_finetuned_ai4privacy_v2
28
 
29
+ This model is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) on the [ai4privacy/pii-masking-200k](https://huggingface.co/datasets/ai4privacy/pii-masking-200k) dataset.
30
  It achieves the following results on the evaluation set:
31
+
32
  - Loss: 0.0323
33
  - Overall Precision: 0.9636
34
  - Overall Recall: 0.9731
35
  - Overall F1: 0.9683
36
  - Overall Accuracy: 0.9896
37
+
38
+ ## Useage
39
+ GitHub Implementation: [Ai4Privacy](https://github.com/Sripaad/ai4privacy)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## Model description
42
 
 
56
 
57
  The following hyperparameters were used during training:
58
  - learning_rate: 5e-05
59
+ - train_batch_size: 32
60
+ - eval_batch_size: 32
61
  - seed: 42
62
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
63
+ - lr_scheduler_type: cosine_with_restarts
64
  - lr_scheduler_warmup_ratio: 0.2
65
  - num_epochs: 5
66
 
 
80
  - Transformers 4.35.2
81
  - Pytorch 2.1.0+cu121
82
  - Datasets 2.16.1
83
+ - Tokenizers 0.15.0