Memory requirements for local training

#21

by Go2Device - opened Feb 13, 2023

Feb 13, 2023

Hello everyone, what are the memory requirements to fine tune this model?
I try to train the large-v2 model locally on my 3090 with 24GB vRAM and even with --auto_find_batch_size I get RuntimeError: No executable batch size found, reached zero. or running in CUDA OOM.
My Workstation is running Ubuntu 22.04, CUDA 11.6, Python 3.9.16, pytorch1.13.1 and the run_speech_recognition_seq2seq.py script from Hugging Face

python3 run_speech_recognition_seq2seq.py \
    --model_name_or_path="openai/whisper-large-v2" \
    --dataset_name="mozilla-foundation/common_voice_11_0" \
    --dataset_config_name="de" \
    --language="german" \
    --train_split_name="train+validation" \
    --eval_split_name="test" \
    --max_steps="5000" \
    --output_dir="./whisper-large-v2-de" \
    --auto_find_batch_size \
    --gradient_accumulation_steps="2" \
    --logging_steps="25" \
    --learning_rate="1e-5" \
    --warmup_steps="500" \
    --evaluation_strategy="steps" \
    --eval_steps="1000" \
    --save_strategy="steps" \
    --save_steps="1000" \
    --generation_max_length="225" \
    --preprocessing_num_workers="1" \
    --length_column_name="input_length" \
    --max_duration_in_seconds="30" \
    --text_column_name="sentence" \
    --freeze_feature_encoder="False" \
    --report_to="tensorboard" \
    --metric_for_best_model="wer" \
    --gradient_checkpointing \
    --group_by_length \
    --fp16 \
    --overwrite_output_dir \
    --do_train \
    --do_eval \
    --predict_with_generate \
    --use_auth_token

sanchit-gandhi

Feb 17, 2023

Hey @Go2Device ! I reckon you'll be able to fine-tune the large-v2 model with a 24GB GPU if you use DeepSpeed. It's quite a straightforward extension to the training set-up you've already got, see https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#deepspeed for details

Go2Device

Feb 17, 2023

Hey @sanchit-gandhi ! Thank you for this hint. Now, my CPU RAM are not enough (48GB).
But thankful this is cheaper to fix then a bigger GPU.
For now, I canceled this project and will take a look in few months.

Go2Device changed discussion status to closed Feb 17, 2023

sanchit-gandhi

Feb 22, 2023

Hey @Go2Device ! What error are you getting regarding CPU RAM? It might be that we need to reduce the dataset's writer_batch_size to a lower value (lower value = less CPU memory but slower processing). Happy to help explore other solutions to reducing CPU RAM! This is the first time I've heard CPU RAM being a limiting factor for fine-tuning Whisper so I'm eager to find a solution here!

Go2Device

Mar 1, 2023

Hello @sanchit-gandhi ! Thank you for your offer to help me in this case. Currently a other model is in training, but when this finished i will again test whisper.

Go2Device

Mar 17, 2023

Hey @sanchit-gandhi , I am Ready to start a new test. I upgraded my RAM from 48 up to 64 GB.
For the other project I reinstalled my Workstation and now using Docker for training.
Have you a working NGC Dockerfile for whisper and Transformers?
Nvidias nvcr.io/nvidia/pytorch Container do not have torchaudio and finding the matching version isint quite easy.
Thanks

Go2Device changed discussion status to open Mar 17, 2023

sanchit-gandhi

Mar 17, 2023

Hey @Go2Device ! There isn't a docker file, but it's pretty easy to get set-up with a pip env! This is a very in-depth guide as to how you can set-up an env for fine-tuning Whisper: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#set-up-an-environment

sanchit-gandhi

Mar 17, 2023

You can ignore the bit about installing ffmpeg! It's all handled by datasets now 🤗

tuanle

Apr 26, 2023

Hey @Go2Device ! What is your recommended batch size after using Deepspeed with 24GB vRAM ? How much time does it take to complete ?
Thank you bro

sanchit-gandhi

May 2, 2023

Hey @tuanle - you should probably set the batch size in accordance with your device (e.g. keep increasing it in multiples of 2 until you get an OOM). There are some rough speed figures here: https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-batch-sizes-with-deepspeed

honzapucalek

Jun 8, 2023

This comment has been hidden

honzapucalek

Jun 8, 2023

This comment has been hidden

artyomboyko

Aug 29, 2023

Hi @sanchit-gandhi . Table https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-batch-sizes-with-deepspeed does not list values for 24GB, what do you recommend?

sanchit-gandhi

Sep 6, 2023

•

edited Sep 6, 2023

Hey @artyomboyko - you'll have to experiment to find what works here! Try bs=32 first. If it OOMs, then drop it to bs=16. If it OOMs again, drop it to bs=8, and so on... Once you go lower than bs=8, it's worth adding gradient accumulation steps so you maintain a reasonable batch size (see https://github.com/huggingface/community-events/tree/main/whisper-fine-tuning-event#recommended-training-configurations)

Sunnnnny

Oct 12, 2023

Hi @sanchit-gandhi , I have finetuned the whisper mode and save the model into a local folder , now I am facing difficulties while trying to load the model, any suggestions will be helpful

sanchit-gandhi

Oct 13, 2023

•

edited Oct 13, 2023

Hey @Sunnnnny - you can load the model from pre-trained by specifying the path to your save folder:

from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("/path/to/save/dir")

Alternatively, you can use the pipeline for easy inference:

from transformers import pipeline

asr_pipe = pipeline("automatic-speech-recognition", model="/path/to/save/dir")
asr_pipe("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac")

Could you please share a code-snippet that shows what you've tried and what's not working (i.e. the full traceback)?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment