jed351 commited on
Commit
9d516ae
1 Parent(s): 2922c06

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -19
README.md CHANGED
@@ -6,39 +6,24 @@ tags:
6
  - cantonese
7
  - fill-mask
8
  license: other
9
- library_name: bart-base-jax
10
- co2_eq_emissions:
11
- emissions: 6.29
12
- source: estimated by using ML CO2 Calculator
13
- training_type: second-stage pre-training
14
- hardware_used: Google Cloud TPU v4-16
15
  ---
16
 
17
  # bart-base-cantonese
18
 
19
- This is the Cantonese model of BART base. It is obtained by a second-stage pre-training on the [LIHKG dataset](https://github.com/ayaka14732/lihkg-scraper) based on the [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) model.
20
-
21
- This project is supported by Cloud TPUs from Google's [TPU Research Cloud](https://sites.research.google/trc/about/) (TRC).
22
-
23
- **Note**: To avoid any copyright issues, please do not use this model for any purpose.
24
 
25
- ## GitHub Links
26
 
27
- - Dataset: [ayaka14732/lihkg-scraper](https://github.com/ayaka14732/lihkg-scraper)
28
- - Tokeniser: [ayaka14732/bert-tokenizer-cantonese](https://github.com/ayaka14732/bert-tokenizer-cantonese)
29
- - Base model: [ayaka14732/bart-base-jax](https://github.com/ayaka14732/bart-base-jax)
30
- - Pre-training: [ayaka14732/bart-base-cantonese](https://github.com/ayaka14732/bart-base-cantonese)
31
 
32
  ## Usage
33
 
34
  ```python
35
  from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline
36
- tokenizer = BertTokenizer.from_pretrained('Ayaka/bart-base-cantonese')
37
- model = BartForConditionalGeneration.from_pretrained('Ayaka/bart-base-cantonese')
38
  text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
39
  output = text2text_generator('聽日就要返香港,我激動到[MASK]唔着', max_length=50, do_sample=False)
40
  print(output[0]['generated_text'].replace(' ', ''))
41
- # output: 聽日就要返香港,我激動到瞓唔着
42
  ```
43
 
44
  **Note**: Please use the `BertTokenizer` for the model vocabulary. DO NOT use the original `BartTokenizer`.
 
6
  - cantonese
7
  - fill-mask
8
  license: other
9
+
 
 
 
 
 
10
  ---
11
 
12
  # bart-base-cantonese
13
 
14
+ This is the Cantonese model of BART base. It is based on another model created by: https://huggingface.co/Ayaka/bart-base-cantonese
 
 
 
 
15
 
 
16
 
 
 
 
 
17
 
18
  ## Usage
19
 
20
  ```python
21
  from transformers import BertTokenizer, BartForConditionalGeneration, Text2TextGenerationPipeline
22
+ tokenizer = BertTokenizer.from_pretrained('jed351/bart-zh-hk-wiki')
23
+ model = BartForConditionalGeneration.from_pretrained('jed351/bart-zh-hk-wiki')
24
  text2text_generator = Text2TextGenerationPipeline(model, tokenizer)
25
  output = text2text_generator('聽日就要返香港,我激動到[MASK]唔着', max_length=50, do_sample=False)
26
  print(output[0]['generated_text'].replace(' ', ''))
 
27
  ```
28
 
29
  **Note**: Please use the `BertTokenizer` for the model vocabulary. DO NOT use the original `BartTokenizer`.