How to fine tune ?

#12

by NickyNicky - opened Sep 28, 2023

Discussion

NickyNicky

Sep 28, 2023

same title ^|

IshaanGupta2311

Sep 28, 2023

same ques, i want to explore an application of these 7B models that i have been thinking about for quite some time

NajiAboo

Sep 28, 2023

looking for same

Lazycuber

Sep 28, 2023

still experimenting with colab finetuning...

silvacarl

Sep 28, 2023

looking for same here as well. so far its amazing...

nadahlberg

Sep 28, 2023

Seems to be mostly working with the default qlora.py script with one small change.

This line was causing trouble because mistral's model.config.pad_token_id returns None:

"unk_token": tokenizer.convert_ids_to_tokens(
     model.config.pad_token_id if model.config.pad_token_id != -1 else tokenizer.pad_token_id
),

Adding a None check seems to fix:

"unk_token": tokenizer.convert_ids_to_tokens(
    model.config.pad_token_id if 
    model.config.pad_token_id is not None and model.config.pad_token_id != -1 
    else tokenizer.pad_token_id
),

silvacarl

Sep 28, 2023

•

edited Sep 28, 2023

thx. i am not even sure yet if it needs fine tuning, we are running a bunch of tests on it.

it seems to be the best 7B model we have ever seen. it may outperform 13B and possibly 70B models for certain use cases.

NickyNicky

Sep 29, 2023

•

edited Sep 29, 2023

it's very fast.

use:
-When flash attention 2 is used and with an inference of approximately 600 tokens <20 seconds if I remember correctly.
-Without flash attention 2, inferences from 100 tokens take +440 seconds.

fine-tune (SFTTrainer):
-gpu a100 colab 40GB
-for a 15k dataset it takes approximately 1H
-14 credits 1H
-per_device_train_batch_size= 6 ---->>> use gpu 36GB

Lazycuber

Sep 29, 2023

Anyone made a colab notebook for finetuning?

sscpr

Oct 1, 2023

Seems to be mostly working with the default qlora.py script with one small change.

using this script and your change seems to work but I had to repull in the latest transformers version 4.34 to get it to work

NickM2002

Oct 2, 2023

Seems to be mostly working with the default qlora.py script with one small change.

using this script and your change seems to work but I had to repull in the latest transformers version 4.34 to get it to work

I also had to repull transformers with
“
pip install git+https://github.com/huggingface/transformers
“