Wav2Vec2-BERT - Alvin

This model is a fine-tuned version of facebook/w2v-bert-2.0. This has a CER of 10.27 on Common Voice 16 (yue) test set (without punctuations).

Training and evaluation data

For training, three datasets were used:

  • Common Voice 16 zh-HK and yue Train Set
  • CantoMap: Winterstein, Grégoire, Tang, Carmen and Lai, Regine (2020) "CantoMap: a Hong Kong Cantonese MapTask Corpus", in Proceedings of The 12th Language Resources and Evaluation Conference, Marseille: European Language Resources Association, p. 2899-2906.
  • Cantonse-ASR: Yu, Tiezheng, Frieske, Rita, Xu, Peng, Cahyawijaya, Samuel, Yiu, Cheuk Tung, Lovenia, Holy, Dai, Wenliang, Barezi, Elham, Chen, Qifeng, Ma, Xiaojuan, Shi, Bertram, Fung, Pascale (2022) "Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset", 2022. Link: https://arxiv.org/pdf/2201.02419.pdf

Code Example

from transformers import pipeline
bert_asr = pipeline(
    "automatic-speech-recognition", model="alvanlii/wav2vec2-BERT-cantonese", device="cuda"
)
text = pipe(file)["text"]

or

import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

model_name = "alvanlii/wav2vec2-BERT-cantonese"

asr_model = AutoModelForCTC.from_pretrained(model_name).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)

audio_input, _ = sf.read(file)

inputs = processor([audio_input], sampling_rate=16_000).input_features
features = torch.tensor(inputs)

with torch.no_grad():
  logits = asr_model(features).logits

predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids, skip_special_tokens=True)

Training Hyperparameters

  • learning_rate: 5e-5
  • train_batch_size: 4 (on 1 3090)
  • eval_batch_size: 1
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 32x4=128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_warmup_steps: 1500
Downloads last month
148
Safetensors
Model size
608M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train alvanlii/wav2vec2-BERT-cantonese

Collection including alvanlii/wav2vec2-BERT-cantonese

Evaluation results