voidful commited on
Commit
202a8b5
1 Parent(s): 2057efe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -9,7 +9,7 @@ tags:
9
  - xlsr-fine-tuning-week
10
  license: apache-2.0
11
  model-index:
12
- - name: XLSR Wav2Vec2 Chinese (Taiwan) by Voidful
13
  results:
14
  - task:
15
  name: Speech Recognition
@@ -25,7 +25,7 @@ model-index:
25
  ---
26
 
27
  # Wav2Vec2-Large-XLSR-53-tw-gpt
28
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on zh-tw using the [Common Voice](https://huggingface.co/datasets/common_voice).
29
  When using this model, make sure that your speech input is sampled at 16kHz.
30
 
31
  ## Usage
@@ -48,7 +48,7 @@ model_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
48
  device = "cuda"
49
  processor_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
50
 
51
- chars_to_ignore_regex = r"[¥•"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·'℃°•·.﹑︰〈〉─《﹖﹣﹂﹁﹔!?。。"#$%&'()*+,﹐-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏..!\\\\"#$%&()*+,\\\\-.\\\\:;<=>?@\\\\[\\\\]\\\\\\\\\\\\/^_`{|}~]"
52
 
53
  model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
54
  processor = Wav2Vec2Processor.from_pretrained(processor_name)
@@ -113,7 +113,7 @@ model_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
113
  device = "cuda"
114
  processor_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
115
 
116
- chars_to_ignore_regex = r"[¥•"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·'℃°•·.﹑︰〈〉─《﹖﹣﹂﹁﹔!?。。"#$%&'()*+,﹐-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏..!\\\\"#$%&()*+,\\\\-.\\\\:;<=>?@\\\\[\\\\]\\\\\\\\\\\\/^_`{|}~]"
117
 
118
  model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
119
  processor = Wav2Vec2Processor.from_pretrained(processor_name)
@@ -170,7 +170,7 @@ from transformers import AutoTokenizer, AutoModelWithLMHead
170
  model_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
171
  device = "cuda"
172
  processor_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
173
- chars_to_ignore_regex = r"""[¥•"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·'℃°•·.﹑︰〈〉─《﹖﹣﹂﹁﹔!?。。"#$%&'()*+,﹐-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏..!\\\\"#$%&()*+,\\\\-.\\\\:;<=>?@\\\\[\\\\]\\\\\\\\\\\\/^_`{|}~]"""
174
 
175
  tokenizer = AutoTokenizer.from_pretrained("ckiplab/gpt2-base-chinese")
176
  gpt_model = AutoModelWithLMHead.from_pretrained("ckiplab/gpt2-base-chinese").to(device)
 
9
  - xlsr-fine-tuning-week
10
  license: apache-2.0
11
  model-index:
12
+ - name: XLSR Wav2Vec2 Taiwanese Mandarin(zh-tw) by Voidful
13
  results:
14
  - task:
15
  name: Speech Recognition
 
25
  ---
26
 
27
  # Wav2Vec2-Large-XLSR-53-tw-gpt
28
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on zh-tw using the [Common Voice](https://huggingface.co/datasets/common_voice).
29
  When using this model, make sure that your speech input is sampled at 16kHz.
30
 
31
  ## Usage
 
48
  device = "cuda"
49
  processor_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
50
 
51
+ chars_to_ignore_regex = r"[¥•"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·'℃°•·.﹑︰〈〉─《﹖﹣﹂﹁﹔!?。。"#$%&'()*+,﹐-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏..!\\\\\\\\"#$%&()*+,\\\\\\\\-.\\\\\\\\:;<=>?@\\\\\\\\[\\\\\\\\]\\\\\\\\\\\\\\\\\\\\\\\\/^_`{|}~]"
52
 
53
  model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
54
  processor = Wav2Vec2Processor.from_pretrained(processor_name)
 
113
  device = "cuda"
114
  processor_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
115
 
116
+ chars_to_ignore_regex = r"[¥•"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·'℃°•·.﹑︰〈〉─《﹖﹣﹂﹁﹔!?。。"#$%&'()*+,﹐-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏..!\\\\\\\\"#$%&()*+,\\\\\\\\-.\\\\\\\\:;<=>?@\\\\\\\\[\\\\\\\\]\\\\\\\\\\\\\\\\\\\\\\\\/^_`{|}~]"
117
 
118
  model = Wav2Vec2ForCTC.from_pretrained(model_name).to(device)
119
  processor = Wav2Vec2Processor.from_pretrained(processor_name)
 
170
  model_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
171
  device = "cuda"
172
  processor_name = "voidful/wav2vec2-large-xlsr-53-tw-gpt"
173
+ chars_to_ignore_regex = r"""[¥•"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·'℃°•·.﹑︰〈〉─《﹖﹣﹂﹁﹔!?。。"#$%&'()*+,﹐-/:;<=>@[\]^_`{|}~⦅⦆「」、、〃》「」『』【】〔〕〖〗〘〙〚〛〜〝〞���〰〾〿–—‘’‛“”„‟…‧﹏..!\\\\\\\\"#$%&()*+,\\\\\\\\-.\\\\\\\\:;<=>?@\\\\\\\\[\\\\\\\\]\\\\\\\\\\\\\\\\\\\\\\\\/^_`{|}~]"""
174
 
175
  tokenizer = AutoTokenizer.from_pretrained("ckiplab/gpt2-base-chinese")
176
  gpt_model = AutoModelWithLMHead.from_pretrained("ckiplab/gpt2-base-chinese").to(device)