File size: 1,481 Bytes
87b5fdc 388f7dd 87b5fdc cc20e16 f310503 15a9f86 f310503 3228651 15a9f86 cc20e16 87b5fdc f5afbf0 87b5fdc f5afbf0 adaf860 f5afbf0 08cb518 f5afbf0 08cb518 f5afbf0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
language:
- en
tags:
- token-classification
- address-NER
- NER
- bert-base-uncased
datasets:
- Ultra Fine Entity Typing
metrics:
- Precision
- Recall
- F1 Score
widget:
- text: "Hi, I am Kermit and I live in Berlin"
- text: "It is very difficult to find a house in Berlin, Germany."
- text: "ML6 is a very cool company from Belgium"
- text: "Samuel ppops in a happy plce called Berlin which happens to be Kazakhstan"
- text: "My family and I visited Montreal, Canada last week and the flight from Amsterdam took 9 hours"
---
## City-Country-NER
A `bert-base-uncased` model finetuned on a custom dataset to detect `Country` and `City` names from a given sentence.
### Custom Dataset
We weakly supervised the [Ultra-Fine Entity Typing](https://www.cs.utexas.edu/~eunsol/html_pages/open_entity.html) dataset to include the `City` and `Country` information. We also did some extra preprocessing to remove false labels.
The model predicts 3 different tags: `OTHER`, `CITY` and `COUNTRY`
### How to use the finetuned model?
```
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("ml6team/bert-base-uncased-city-country-ner")
model = AutoModelForTokenClassification.from_pretrained("ml6team/bert-base-uncased-city-country-ner")
from transformers import pipeline
nlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
nlp("My name is Kermit and I live in London.")
``` |