Update README.md
Browse files
README.md
CHANGED
@@ -6,39 +6,24 @@ tags:
|
|
6 |
- cantonese
|
7 |
- fill-mask
|
8 |
license: other
|
9 |
-
|
10 |
-
co2_eq_emissions:
|
11 |
-
emissions: 6.29
|
12 |
-
source: estimated by using ML CO2 Calculator
|
13 |
-
training_type: second-stage pre-training
|
14 |
-
hardware_used: Google Cloud TPU v4-16
|
15 |
---
|
16 |
|
17 |
# bart-base-cantonese
|
18 |
|
19 |
-
This is the Cantonese model of BART base. It is
|
20 |
-
|
21 |
-
This project is supported by Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
|
22 |
-
|
23 |
-
**Note**: To avoid any copyright issues, please do not use this model for any purpose.
|
24 |
|
25 |
-
## GitHub Links
|
26 |
|
27 |
-
- Dataset: [ayaka14732/lihkg-scraper](https://github.com/ayaka14732/lihkg-scraper)
|
28 |
-
- Tokeniser: [ayaka14732/bert-tokenizer-cantonese](https://github.com/ayaka14732/bert-tokenizer-cantonese)
|
29 |
-
- Base model: [ayaka14732/bart-base-jax](https://github.com/ayaka14732/bart-base-jax)
|
30 |
-
- Pre-training: [ayaka14732/bart-base-cantonese](https://github.com/ayaka14732/bart-base-cantonese)
|
31 |
|
32 |
## Usage
|
33 |
|
34 |
```python
|
35 |
from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline
|
36 |
-
tokenizer = BertTokenizer.from_pretrained('
|
37 |
-
model = BartForConditionalGeneration.from_pretrained('
|
38 |
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
|
39 |
output = text2text_generator('聽日就要返香港,我激動到[MASK]唔着', max_length=50, do_sample=False)
|
40 |
print(output[0]['generated_text'].replace(' ', ''))
|
41 |
-
# output: 聽日就要返香港,我激動到瞓唔着
|
42 |
```
|
43 |
|
44 |
**Note**: Please use the `BertTokenizer` for the model vocabulary. DO NOT use the original `BartTokenizer`.
|
|
|
6 |
- cantonese
|
7 |
- fill-mask
|
8 |
license: other
|
9 |
+
|
|
|
|
|
|
|
|
|
|
|
10 |
---
|
11 |
|
12 |
# bart-base-cantonese
|
13 |
|
14 |
+
This is the Cantonese model of BART base. It is based on another model created by: https://huggingface.co/Ayaka/bart-base-cantonese
|
|
|
|
|
|
|
|
|
15 |
|
|
|
16 |
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Usage
|
19 |
|
20 |
```python
|
21 |
from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline
|
22 |
+
tokenizer = BertTokenizer.from_pretrained('jed351/bart-zh-hk-wiki')
|
23 |
+
model = BartForConditionalGeneration.from_pretrained('jed351/bart-zh-hk-wiki')
|
24 |
text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
|
25 |
output = text2text_generator('聽日就要返香港,我激動到[MASK]唔着', max_length=50, do_sample=False)
|
26 |
print(output[0]['generated_text'].replace(' ', ''))
|
|
|
27 |
```
|
28 |
|
29 |
**Note**: Please use the `BertTokenizer` for the model vocabulary. DO NOT use the original `BartTokenizer`.
|