sadrasabouri
commited on
Commit
•
b466f87
1
Parent(s):
1e2b39a
Update README.md
Browse files
README.md
CHANGED
@@ -32,45 +32,34 @@ model-index:
|
|
32 |
|
33 |
# Sharif-wav2vec2
|
34 |
|
35 |
-
Prior to the usage, you may need to install the below dependencies:
|
36 |
|
37 |
```shell
|
38 |
pip -q install pyctcdecode
|
39 |
python -m pip -q install pypi-kenlm
|
40 |
```
|
41 |
|
42 |
-
|
|
|
43 |
```python
|
44 |
import tensorflow
|
45 |
import torchaudio
|
46 |
import torch
|
47 |
-
import librosa
|
48 |
import numpy as np
|
49 |
-
from transformers import AutoProcessor, AutoModelForCTC
|
50 |
-
|
51 |
-
processor = AutoProcessor.from_pretrained("SLPL/Sharif-wav2vec2")
|
52 |
-
model = AutoModelForCTC.from_pretrained("SLPL/Sharif-wav2vec2")
|
53 |
-
|
54 |
-
|
55 |
|
56 |
-
|
57 |
-
speech_array, sampling_rate = torchaudio.load("test.wav")
|
58 |
speech_array = speech_array.squeeze().numpy()
|
59 |
-
speech_array = librosa.resample(
|
60 |
-
np.asarray(speech_array),
|
61 |
-
sampling_rate,
|
62 |
-
processor.feature_extractor.sampling_rate)
|
63 |
-
|
64 |
|
65 |
features = processor(
|
66 |
speech_array,
|
67 |
sampling_rate=processor.feature_extractor.sampling_rate,
|
68 |
return_tensors="pt",
|
69 |
padding=True)
|
70 |
-
|
71 |
-
attention_mask = features.attention_mask
|
72 |
with torch.no_grad():
|
73 |
-
logits = model(
|
|
|
|
|
74 |
prediction = processor.batch_decode(logits.numpy()).text
|
75 |
|
76 |
print(prediction[0])
|
|
|
32 |
|
33 |
# Sharif-wav2vec2
|
34 |
|
35 |
+
This is the fine-tuned version of Sharif Wav2vec2 for Farsi. Prior to the usage, you may need to install the below dependencies:
|
36 |
|
37 |
```shell
|
38 |
pip -q install pyctcdecode
|
39 |
python -m pip -q install pypi-kenlm
|
40 |
```
|
41 |
|
42 |
+
For testing you can use the hoster API at the hugging face (There are provided examples from common voice) it may take a while to transcribe the given voice. Or you can use bellow code for local run:
|
43 |
+
|
44 |
```python
|
45 |
import tensorflow
|
46 |
import torchaudio
|
47 |
import torch
|
|
|
48 |
import numpy as np
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
|
50 |
+
speech_array, sampling_rate = torchaudio.load("wav2vec2-test.wav")
|
|
|
51 |
speech_array = speech_array.squeeze().numpy()
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
features = processor(
|
54 |
speech_array,
|
55 |
sampling_rate=processor.feature_extractor.sampling_rate,
|
56 |
return_tensors="pt",
|
57 |
padding=True)
|
58 |
+
|
|
|
59 |
with torch.no_grad():
|
60 |
+
logits = model(
|
61 |
+
features.input_values,
|
62 |
+
attention_mask=features.attention_mask).logits
|
63 |
prediction = processor.batch_decode(logits.numpy()).text
|
64 |
|
65 |
print(prediction[0])
|