SLPL
/

sadrasabouri commited on
Commit
b466f87
1 Parent(s): 1e2b39a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -19
README.md CHANGED
@@ -32,45 +32,34 @@ model-index:
32
 
33
  # Sharif-wav2vec2
34
 
35
- Prior to the usage, you may need to install the below dependencies:
36
 
37
  ```shell
38
  pip -q install pyctcdecode
39
  python -m pip -q install pypi-kenlm
40
  ```
41
 
42
- Then you can use it with:
 
43
  ```python
44
  import tensorflow
45
  import torchaudio
46
  import torch
47
- import librosa
48
  import numpy as np
49
- from transformers import AutoProcessor, AutoModelForCTC
50
-
51
- processor = AutoProcessor.from_pretrained("SLPL/Sharif-wav2vec2")
52
- model = AutoModelForCTC.from_pretrained("SLPL/Sharif-wav2vec2")
53
-
54
-
55
 
56
-
57
- speech_array, sampling_rate = torchaudio.load("test.wav")
58
  speech_array = speech_array.squeeze().numpy()
59
- speech_array = librosa.resample(
60
- np.asarray(speech_array),
61
- sampling_rate,
62
- processor.feature_extractor.sampling_rate)
63
-
64
 
65
  features = processor(
66
  speech_array,
67
  sampling_rate=processor.feature_extractor.sampling_rate,
68
  return_tensors="pt",
69
  padding=True)
70
- input_values = features.input_values
71
- attention_mask = features.attention_mask
72
  with torch.no_grad():
73
- logits = model(input_values, attention_mask=attention_mask).logits
 
 
74
  prediction = processor.batch_decode(logits.numpy()).text
75
 
76
  print(prediction[0])
 
32
 
33
  # Sharif-wav2vec2
34
 
35
+ This is the fine-tuned version of Sharif Wav2vec2 for Farsi. Prior to the usage, you may need to install the below dependencies:
36
 
37
  ```shell
38
  pip -q install pyctcdecode
39
  python -m pip -q install pypi-kenlm
40
  ```
41
 
42
+ For testing you can use the hoster API at the hugging face (There are provided examples from common voice) it may take a while to transcribe the given voice. Or you can use bellow code for local run:
43
+
44
  ```python
45
  import tensorflow
46
  import torchaudio
47
  import torch
 
48
  import numpy as np
 
 
 
 
 
 
49
 
50
+ speech_array, sampling_rate = torchaudio.load("wav2vec2-test.wav")
 
51
  speech_array = speech_array.squeeze().numpy()
 
 
 
 
 
52
 
53
  features = processor(
54
  speech_array,
55
  sampling_rate=processor.feature_extractor.sampling_rate,
56
  return_tensors="pt",
57
  padding=True)
58
+
 
59
  with torch.no_grad():
60
+ logits = model(
61
+ features.input_values,
62
+ attention_mask=features.attention_mask).logits
63
  prediction = processor.batch_decode(logits.numpy()).text
64
 
65
  print(prediction[0])