voxreality
/

whisper-small-el-adapters

Inference Endpoints

Model card Files Files and versions Community

whisper-small-el-adapters / README.md

issam9's picture

Update README.md

10aced4 verified 11 months ago

|

history blame contribute delete

2.78 kB

	---
	datasets:
	- mozilla-foundation/common_voice_11_0
	language:
	- el
	metrics:
	- wer
	license: apache-2.0
	---

	# Whisper small adapters model for Greek transcription
	We added adapters to whisper-small model then we finetuned it on Greek ASR. During training, the model is frozen and only the adapters are being trained. When trying to
	transcribe Greek, we need to activate the adapters, otherwise we can ignore the adapters and use the original whisper model.
	## How to use

	Start by installing transformers with Whisper model with added adapters
	```bash
	git clone https://gitlab.com/horizon-europe-voxreality/multilingual-translation/speech-translation-demo.git
	cd speech-translation-demo
	# You might need to switch to dev branch
	pip install -e transformers
	```
	The parameter `use_adapters` is used to decide whether we will use the adapters or not. It needs to be set to True only in the case of Greek.

	```python
	from transformers import WhisperProcessor, WhisperForConditionalGenerationWithAdapters
	from datasets import Audio, load_dataset

	# load model and processor
	processor = WhisperProcessor.from_pretrained("voxreality/whisper-small-el-adapters")
	model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")
	forced_decoder_ids = processor.get_decoder_prompt_ids(language="greek", task="transcribe")

	# load streaming dataset and read first audio sample
	ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
	ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
	input_speech = next(iter(ds))["audio"]
	input_features = processor(input_speech["array"], sampling_rate=input_speech["sampling_rate"], return_tensors="pt").input_features

	# Set use_adapters to False for languages other than Greek.
	# generate token ids
	predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids, use_adapters=True)

	# decode token ids to text
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
	```

	You can also use an HF pipeline:
	```python
	from transformers import pipeline
	from datasets import Audio, load_dataset

	ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
	ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
	input_speech = next(iter(ds))["audio"]

	model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")

	pipe = pipeline("automatic-speech-recognition", model=model, tokenizer="voxreality/whisper-small-el-adapters",
	"voxreality/whisper-small-el-adapters", device='cpu', batch_size=32)

	transcription = pipe(input_speech['array'], generate_kwargs = {"language":f"<\|el\|>","task": "transcribe", "use_adapters": True})
	```