File size: 1,326 Bytes
f7bb68d
 
021e23f
 
 
 
 
 
 
f7bb68d
 
021e23f
f7bb68d
021e23f
f7bb68d
021e23f
f7bb68d
021e23f
f7bb68d
021e23f
 
 
 
f7bb68d
021e23f
 
f7bb68d
021e23f
 
 
 
 
 
 
 
 
f7bb68d
021e23f
f7bb68d
021e23f
 
 
 
 
 
 
 
f7bb68d
021e23f
f7bb68d
021e23f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
library_name: transformers
language: ja
license: apache-2.0
datasets: reazon-research/reazonspeech
pipeline_tag: feature-extraction
tags:
  - wav2vec2
  - speech
---

# `reazon-research/japanese-wav2vec2-base`

This is a Japanese wav2vec 2.0 Base model pre-trained on [ReazonSpeech v2.0 corpus](https://huggingface.co/datasets/reazon-research/reazonspeech).

We also release the CTC model [`reazon-research/japanese-wav2vec2-base-rs35kh`](https://huggingface.co/reazon-research/japanese-wav2vec2-base-rs35kh) derived from this model.

## Usage

```python
import librosa
import torch
from transformers import AutoFeatureExtractor, AutoModel

feature_extractor = AutoFeatureExtractor.from_pretrained("reazon-research/japanese-wav2vec2-base")
model = AutoModel.from_pretrained("reazon-research/japanese-wav2vec2-base")

audio, sr = librosa.load(audio_file, sr=16_000)
inputs = feature_extractor(
    audio,
    return_tensors="pt",
    sampling_rate=sr,
)
with torch.inference_mode():
    outputs = model(**inputs)
```

## Citation

```bibtex
@misc{reazon-research-japanese-wav2vec2-base,
  title={japanese-wav2vec2-base},
  author={Sasaki, Yuta},
  url = {https://huggingface.co/reazon-research/japanese-wav2vec2-base},
  year = {2024}
}
```

## License

[Apaceh Licence 2.0](https://choosealicense.com/licenses/apache-2.0/)