Memory requirements for local training
Hello everyone, what are the memory requirements to fine tune this model?
I try to train the large-v2 model locally on my 3090 with 24GB vRAM and even with --auto_find_batch_size
I get RuntimeError: No executable batch size found, reached zero.
or running in CUDA OOM.
My Workstation is running Ubuntu 22.04, CUDA 11.6, Python 3.9.16, pytorch1.13.1 and the run_speech_recognition_seq2seq.py
script from Hugging Face
python3 run_speech_recognition_seq2seq.py \
--model_name_or_path="openai/whisper-large-v2" \
--dataset_name="mozilla-foundation/common_voice_11_0" \
--dataset_config_name="de" \
--language="german" \
--train_split_name="train+validation" \
--eval_split_name="test" \
--max_steps="5000" \
--output_dir="./whisper-large-v2-de" \
--auto_find_batch_size \
--gradient_accumulation_steps="2" \
--logging_steps="25" \
--learning_rate="1e-5" \
--warmup_steps="500" \
--evaluation_strategy="steps" \
--eval_steps="1000" \
--save_strategy="steps" \
--save_steps="1000" \
--generation_max_length="225" \
--preprocessing_num_workers="1" \
--length_column_name="input_length" \
--max_duration_in_seconds="30" \
--text_column_name="sentence" \
--freeze_feature_encoder="False" \
--report_to="tensorboard" \
--metric_for_best_model="wer" \
--gradient_checkpointing \
--group_by_length \
--fp16 \
--overwrite_output_dir \
--do_train \
--do_eval \
--predict_with_generate \
--use_auth_token
Hey @Go2Device ! I reckon you'll be able to fine-tune the large-v2 model with a 24GB GPU if you use DeepSpeed. It's quite a straightforward extension to the training set-up you've already got, see https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#deepspeed for details
Hey
@sanchit-gandhi
! Thank you for this hint. Now, my CPU RAM are not enough (48GB).
But thankful this is cheaper to fix then a bigger GPU.
For now, I canceled this project and will take a look in few months.
Hey
@Go2Device
! What error are you getting regarding CPU RAM? It might be that we need to reduce the dataset's writer_batch_size
to a lower value (lower value = less CPU memory but slower processing). Happy to help explore other solutions to reducing CPU RAM! This is the first time I've heard CPU RAM being a limiting factor for fine-tuning Whisper so I'm eager to find a solution here!
Hello @sanchit-gandhi ! Thank you for your offer to help me in this case. Currently a other model is in training, but when this finished i will again test whisper.
Hey
@sanchit-gandhi
, I am Ready to start a new test. I upgraded my RAM from 48 up to 64 GB.
For the other project I reinstalled my Workstation and now using Docker for training.
Have you a working NGC Dockerfile for whisper and Transformers?
Nvidias nvcr.io/nvidia/pytorch Container do not have torchaudio and finding the matching version isint quite easy.
Thanks
Hey @Go2Device ! There isn't a docker file, but it's pretty easy to get set-up with a pip env! This is a very in-depth guide as to how you can set-up an env for fine-tuning Whisper: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#set-up-an-environment
You can ignore the bit about installing ffmpeg! It's all handled by datasets now 🤗
Hey
@Go2Device
! What is your recommended batch size after using Deepspeed with 24GB vRAM ? How much time does it take to complete ?
Thank you bro
Hey @tuanle - you should probably set the batch size in accordance with your device (e.g. keep increasing it in multiples of 2 until you get an OOM). There are some rough speed figures here: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-batch-sizes-with-deepspeed
Hi @sanchit-gandhi . Table https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-batch-sizes-with-deepspeed does not list values for 24GB, what do you recommend?
Hey @artyomboyko - you'll have to experiment to find what works here! Try bs=32 first. If it OOMs, then drop it to bs=16. If it OOMs again, drop it to bs=8, and so on... Once you go lower than bs=8, it's worth adding gradient accumulation steps so you maintain a reasonable batch size (see https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-training-configurations)
Hi @sanchit-gandhi , I have finetuned the whisper mode and save the model into a local folder , now I am facing difficulties while trying to load the model, any suggestions will be helpful
Hey @Sunnnnny - you can load the model from pre-trained by specifying the path to your save folder:
from transformers import WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("/path/to/save/dir")
Alternatively, you can use the pipeline for easy inference:
from transformers import pipeline
asr_pipe = pipeline("automatic-speech-recognition", model="/path/to/save/dir")
asr_pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac")
Could you please share a code-snippet that shows what you've tried and what's not working (i.e. the full traceback)?