---
license: apache-2.0
language: as
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning
- robust-speech-event
datasets:
- common_voice
model-index:
- name: wav2vec2-large-xls-r-300m-assamese
  results: []
---

# wav2vec2-large-xls-r-300m-assamese

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice_7_0 dataset.
It achieves the following results on the evaluation set:

- WER: 0.7954545454545454
- CER: 0.32341269841269843

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

To compute the evaluation parameters

```bash
cd wav2vec2-large-xls-r-300m-odia; python eval.py --model_id ./ --dataset mozilla-foundation/common_voice_7_0 --config or --split test --log_outputs
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 3e-4
- train_batch_size: 16
- eval_batch_size: 8
- seed: not given
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 400
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer      |
|:-------------:|:------:|:----:|:---------------:|:------:  |
| 1.584065      | NA     | 400  | 1.584065        | 0.915512 |
| 1.658865      | Na     | 800  | 1.658865        | 0.805096 |
| 1.882352      | NA     | 1200 | 1.882352        | 0.820742 |
| 1.881240      | NA     | 1600 | 1.881240        | 0.810907 |
| 2.159748      | NA     | 2000 | 2.159748        | 0.804202 |
| 1.992871      | NA     | 2400 | 1.992871        | 0.803308 |
| 2.201436      | NA     | 2800 | 2.201436        | 0.802861 |
| 2.165218      | NA     | 3200 | 2.165218        | 0.793920 |
| 2.253643      | NA     | 3600 | 2.253643        | 0.796603 |
| 2.265880      | NA     | 4000 | 2.265880        | 0.790344 |
| 2.293935      | NA     | 4400 | 2.293935        | 0.797050 |
| 2.288851      | NA     | 4800 | 2.288851        | 0.784086 |


### Framework versions

- Transformers 4.11.3
- Pytorch 1.10.0+cu113
- Datasets 1.13.3
- Tokenizers 0.10.3