elgeish
/

wav2vec2-base-timit-asr

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-base-timit-asr / README.md

elgeish's picture

upload files

f142d8d almost 4 years ago

|

3 kB

	---
	language: en
	datasets:
	- timit_asr
	tags:
	- audio
	- automatic-speech-recognition
	license: apache-2.0
	widget:
	- label: Sample 1 (from LibriSpeech)
	src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
	---

	# Wav2Vec2-Base-TIMIT

	Fine-tuned [facebook/wav2vec2-base](https://huggingface.co/facebook/wav2vec2-base)
	on the [timit_asr dataset](https://huggingface.co/datasets/timit_asr).
	When using this model, make sure that your speech input is sampled at 16kHz.

	## Usage

	The model can be used directly (without a language model) as follows:

	```python
	import torch
	from datasets import load_dataset
	import soundfile as sf
	from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

	model_name = "elgeish/wav2vec2-base-timit"
	processor = Wav2Vec2Processor.from_pretrained(model_name, do_lower_case=True)
	model = Wav2Vec2ForCTC.from_pretrained(model_name)
	dataset = load_dataset("timit_asr", split="test[:10]")

	def prepare_example(example):
	example["speech"], _ = sf.read(example["file"])
	return example

	dataset = dataset.map(prepare_example, remove_columns=["file"])
	inputs = processor(dataset["speech"], sampling_rate=16000, return_tensors="pt", padding="longest")

	with torch.no_grad():
	predicted_ids = torch.argmax(model(inputs.input_values).logits, dim=-1)
	predicted_transcripts = processor.tokenizer.batch_decode(predicted_ids)
	for reference, predicted in zip(dataset["text"], predicted_transcripts):
	print("reference:", reference)
	print("predicted:", predicted)
	print("--")
	```

	Here's the output:

	```
	reference: The bungalow was pleasantly situated near the shore.
	predicted: the bunglow was plesntly situated near the shor
	--
	reference: Don't ask me to carry an oily rag like that.
	predicted: don't ask me to carry an oily rag like that
	--
	reference: Are you looking for employment?
	predicted: are you oking for employment
	--
	reference: She had your dark suit in greasy wash water all year.
	predicted: she had your dark suit in greasy wash water all year
	--
	reference: At twilight on the twelfth day we'll have Chablis.
	predicted: at twilight on the twelfth day we'll have shiple
	--
	reference: Eating spinach nightly increases strength miraculously.
	predicted: eating spanage nightly increases strength moraculously
	--
	reference: Got a heck of a buy on this, dirt cheap.
	predicted: got a heck of a by on this dert cheep
	--
	reference: The scalloped edge is particularly appealing.
	predicted: the scaliped edge iuse particularly appeling
	--
	reference: A big goat idly ambled through the farmyard.
	predicted: a big goat idely ambled through the farmyard
	--
	reference: This group is secularist and their program tends to be technological.
	predicted: this croup is secularist and their program tens to be technological
	--
	```

	## Fine-Tuning Script

	You can find the script used to produce this model
	[here](https://github.com/elgeish/transformers/blob/f2b98f876b040bab3c3db8561ec39c1abb2c733c/examples/research_projects/wav2vec2/finetune_base_timit_asr.sh).