Token Classification
GLiNER
PyTorch
ner
pii
vicgalle commited on
Commit
b92dd95
1 Parent(s): 97fa3a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -3
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - urchade/synthetic-pii-ner-mistral-v1
5
+ tags:
6
+ - gliner
7
+ - ner
8
+ - pii
9
+ ---
10
+
11
+ # GLiNER-small-PII
12
+
13
+ This model has been trained by fine-tuning gliner-community/gliner_small-v2.5 on the urchade/synthetic-pii-ner-mistral-v1 dataset.
14
+
15
+ This model is capable of recognizing various types of personally identifiable information (PII), including but not limited to these entity types: person, organization, phone number, address, passport number, email, credit card number, social security number, health insurance id number, date of birth, mobile phone number, bank account number, medication, cpf, driver's license number, tax identification number, medical condition, identity card number, national id number, ip address, email address, iban, credit card expiration date, username, health insurance number, registration number, student id number, insurance number, flight number, landline phone number, blood type, cvv, reservation number, digital signature, social media handle, license plate number, cnpj, postal code, passport_number, serial number, vehicle registration number, credit card brand, fax number, visa number, insurance company, identity document number, transaction number, national health insurance number, cvc, birth certificate number, train ticket number, passport expiration date, and social_security_number.
16
+
17
+ ## Usage
18
+
19
+ ```python
20
+ text = """
21
+ Harilala Rasoanaivo, un homme d'affaires local d'Antananarivo, a enregistré une nouvelle société nommée "Rasoanaivo Enterprises" au Lot II M 92 Antohomadinika. Son numéro est le +261 32 22 345 67, et son adresse électronique est harilala.rasoanaivo@telma.mg. Il a fourni son numéro de sécu 501-02-1234 pour l'enregistrement.
22
+ """
23
+
24
+ labels = [
25
+ "work",
26
+ "booking number",
27
+ "personally identifiable information",
28
+ "driver licence",
29
+ "person",
30
+ "book",
31
+ "postal address",
32
+ "company",
33
+ "actor",
34
+ "character",
35
+ "email",
36
+ "passport number",
37
+ "SSN",
38
+ "phone number",
39
+ ]
40
+ entities = md.predict_entities(text, labels, threshold=0.1)
41
+
42
+ for entity in entities:
43
+ print(entity["text"], "=>", entity["label"])
44
+ ```
45
+
46
+ ```
47
+ Harilala Rasoanaivo => person
48
+ Rasoanaivo Enterprises => company
49
+ Lot II M 92 Antohomadinika => postal address
50
+ +261 32 22 345 67 => phone number
51
+ harilala.rasoanaivo@telma.mg => email
52
+ 501-02-1234 => SSN
53
+ ```