obaidtambo
commited on
Commit
•
79cff9a
1
Parent(s):
9c7d08b
Updated Usage code
Browse files
README.md
CHANGED
@@ -24,6 +24,11 @@ This repository contains a BERT tokenizer that has been trained on more than 200
|
|
24 |
The tokenizer is capable of accurately tokenizing Hinglish text, splitting it into individual tokens that can be used as input to a BERT model. Here is an example of how the tokenizer works:
|
25 |
|
26 |
```python
|
|
|
|
|
|
|
|
|
|
|
27 |
example = "aap se kuch keha tha kehte kehte reh gaye"
|
28 |
tokens = tokenizer.tokenize(example)
|
29 |
print(tokens)
|
|
|
24 |
The tokenizer is capable of accurately tokenizing Hinglish text, splitting it into individual tokens that can be used as input to a BERT model. Here is an example of how the tokenizer works:
|
25 |
|
26 |
```python
|
27 |
+
# Load model directly
|
28 |
+
from transformers import AutoTokenizer
|
29 |
+
|
30 |
+
tokenizer = AutoTokenizer.from_pretrained("obaidtambo/hinglish_bert_tokenizer")
|
31 |
+
|
32 |
example = "aap se kuch keha tha kehte kehte reh gaye"
|
33 |
tokens = tokenizer.tokenize(example)
|
34 |
print(tokens)
|