vasilis commited on
Commit
bd3411a
·
1 Parent(s): 387522c

updates model

Browse files
Files changed (3) hide show
  1. README.md +6 -9
  2. config.json +1 -1
  3. pytorch_model.bin +1 -1
README.md CHANGED
@@ -25,10 +25,10 @@ model-index:
25
  metrics:
26
  - name: Test WER
27
  type: wer
28
- value: 47.117220
29
  - name: Test CER
30
  type: cer
31
- value: 7.880525
32
  ---
33
 
34
  # Wav2Vec2-Large-XLSR-53-finnish
@@ -88,8 +88,8 @@ import re
88
  test_dataset = load_dataset("common_voice", "fi", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
89
  wer = load_metric("wer")
90
 
91
- processor = Wav2Vec2Processor.from_pretrained("vasilis/wav2vec2-large-xlsr-53-finnish") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
92
- model = Wav2Vec2ForCTC.from_pretrained("vasilis/wav2vec2-large-xlsr-53-finnish") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
93
  model.to("cuda")
94
 
95
  chars_to_ignore_regex = "[\,\?\.\!\-\;\:\"\“\%\‘\”\�\']" # TODO: adapt this list to include all special characters you removed from the data
@@ -134,15 +134,12 @@ print("CER: {:2f}".format(100 * wer.compute(predictions=[" ".join(list(entry)) f
134
 
135
  ```
136
 
137
- **Test Result**: 47.117220 %
138
 
139
 
140
  ## Training
141
 
142
 
143
  The Common Voice train dataset was used for training. Also all of `CSS10 Finnish` was used using the normalized transcripts.
144
- The model hasn't converged yet.
145
-
146
-
147
-
148
 
 
25
  metrics:
26
  - name: Test WER
27
  type: wer
28
+ value: 38.335242
29
  - name: Test CER
30
  type: cer
31
+ value: 6.552408
32
  ---
33
 
34
  # Wav2Vec2-Large-XLSR-53-finnish
 
88
  test_dataset = load_dataset("common_voice", "fi", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
89
  wer = load_metric("wer")
90
 
91
+ processor = Wav2Vec2Processor.from_pretrained("vasilis/wav2vec2-large-xlsr-53-finnish")
92
+ model = Wav2Vec2ForCTC.from_pretrained("vasilis/wav2vec2-large-xlsr-53-finnish")
93
  model.to("cuda")
94
 
95
  chars_to_ignore_regex = "[\,\?\.\!\-\;\:\"\“\%\‘\”\�\']" # TODO: adapt this list to include all special characters you removed from the data
 
134
 
135
  ```
136
 
137
+ **Test Result**: 38.335242 %
138
 
139
 
140
  ## Training
141
 
142
 
143
  The Common Voice train dataset was used for training. Also all of `CSS10 Finnish` was used using the normalized transcripts.
144
+ After 20000 steps the models was finetuned using the common voice train and validation sets for 2000 steps more.
 
 
 
145
 
config.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "_name_or_path": "facebook/wav2vec2-large-xlsr-53",
3
  "activation_dropout": 0.0,
4
  "apply_spec_augment": true,
5
  "architectures": [
 
1
  {
2
+ "_name_or_path": "/speech-data-1/dev/hugging_face_finetuning_week/fi_demo/checkpoints/2020_27_3_v4/checkpoint-15200",
3
  "activation_dropout": 0.0,
4
  "apply_spec_augment": true,
5
  "architectures": [
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dcc8846b1f384bd0511e6a21a9993e4c38c796eef0f34468bfc31198c084f11f
3
  size 1262056855
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10069dd469767be123bf30757be12a8d249b99394d8475bd44cb4c671d367131
3
  size 1262056855