faster-whisper-large-v3

This is the model Whisper large-v3 converted to be used in faster-whisper.

Using

You can choose between monkey-patching faster-whisper 0.9.0 (while they don't update it) or using my fork (which is easier).

Using my fork

First, install it by executing:

pip install -U 'transformers[torch]>=4.35.0' https://github.com/PythonicCafe/faster-whisper/archive/refs/heads/feature/large-v3.zip#egg=faster-whisper

Then, use it as the regular faster-whisper:

import time

import faster_whisper


filename = "my-audio.mp3"
initial_prompt = "My podcast recording"  # Or `None`
word_timestamps = False
vad_filter = True
temperature = 0.0
language = "pt"
model_size = "large-v3"
device, compute_type = "cuda", "float16"
# or: device, compute_type = "cpu", "float32"

model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type)

segments, transcription_info = model.transcribe(
    filename,
    word_timestamps=word_timestamps,
    vad_filter=vad_filter,
    temperature=temperature,
    language=language,
    initial_prompt=initial_prompt,
)
print(transcription_info)

start_time = time.time()
for segment in segments:
    row = {
        "start": segment.start,
        "end": segment.end,
        "text": segment.text,
    }
    if word_timestamps:
        row["words"] = [
            {"start": word.start, "end": word.end, "word": word.word}
            for word in segment.words
        ]
    print(row)
end_time = time.time()
print(f"Transcription finished in {end_time - start_time:.2f}s")

Monkey-patching faster-whisper 0.9.0

Make sure you have the latest version:

pip install -U 'faster-whisper>=0.9.0'

Then, use it with some little changes:

import time

import faster_whisper.transcribe


# Monkey patch 1 (add model to list)
faster_whisper.utils._MODELS["large-v3"] = "turicas/faster-whisper-large-v3"

# Monkey patch 2 (fix Tokenizer)
faster_whisper.transcribe.Tokenizer.encode = lambda self, text: self.tokenizer.encode(text, add_special_tokens=False)

filename = "my-audio.mp3"
initial_prompt = "My podcast recording"  # Or `None`
word_timestamps = False
vad_filter = True
temperature = 0.0
language = "pt"
model_size = "large-v3"
device, compute_type = "cuda", "float16"
# or: device, compute_type = "cpu", "float32"

model = faster_whisper.transcribe.WhisperModel(model_size, device=device, compute_type=compute_type)

# Monkey patch 3 (change n_mels)
from faster_whisper.feature_extractor import FeatureExtractor
model.feature_extractor = FeatureExtractor(feature_size=128)

# Monkey patch 4 (change tokenizer)
from transformers import AutoProcessor
model.hf_tokenizer = AutoProcessor.from_pretrained("openai/whisper-large-v3").tokenizer
model.hf_tokenizer.token_to_id = lambda token: model.hf_tokenizer.convert_tokens_to_ids(token)

segments, transcription_info = model.transcribe(
    filename,
    word_timestamps=word_timestamps,
    vad_filter=vad_filter,
    temperature=temperature,
    language=language,
    initial_prompt=initial_prompt,
)
print(transcription_info)

start_time = time.time()
for segment in segments:
    row = {
        "start": segment.start,
        "end": segment.end,
        "text": segment.text,
    }
    if word_timestamps:
        row["words"] = [
            {"start": word.start, "end": word.end, "word": word.word}
            for word in segment.words
        ]
    print(row)
end_time = time.time()
print(f"Transcription finished in {end_time - start_time:.2f}s")

Converting

If you'd like to convert the model yourself, execute:

pip install -U 'ctranslate2>=3.21.0' 'transformers-4.35.0' 'OpenNMT-py==2.*' sentencepiece
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2

Then, the files will be at whisper-large-v3-ct2/.

License

These files have the same license as the original openai/whisper-large-v3 model: Apache 2.0.

Downloads last month
23
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.