Training details

#1
by anakin87 - opened

Hello and thanks for the good model!

If I understood well, after DPO on an English dataset, the model has been trained on Italian data.
Can you share more details about this step? I can't find the related script on GitHub...

SWAP Research Group@UNIBA org

Hi, you can find the DPO script here: https://github.com/marcopoli/LLaMAntino-3-ANITA/blob/main/model_adaptation/dpo_llama3.py
and the SFT script here: https://github.com/marcopoli/LLaMAntino-3-ANITA/blob/main/model_adaptation/finetune_llama3.py
Just change "model_name" and "dataset" accordingly. For the adaptation on the Italian language, just use the SFT script on a small portion of an Italian Data (e.g., gsarti/clean_mc4_it) using plain text without chat template, i.e. (<|begin_of_text|> {text} <|eot_id|><|end_of_text|>)

Thanks.
Very informative!

Hi @m-polignano-uniba ,

Is fine-tuning with the Italian language performed with QLoRA/LoRa or without?

SWAP Research Group@UNIBA org

Yes, we used QLoRA through Unsloth:

  • load_in_4bit=True, r = 64, lora_alpha = 16, ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

During the language adaptation phase, can you share a rough idea of the peak GPU VRAM usage?

In the paper, I read you used an Nvidia H100 64GB GPU but further details would be much appreciated.

SWAP Research Group@UNIBA org

Unfortunately, we use an HPC Cluster that does not allow us to check VRAM usage during training (mostly because the GPU are shared). Just a small correction: the graphics card is a custom NVIDIA A100-SXM-64GB (https://www.nvidia.com/it-it/data-center/a100/)

Hi,

Thanks for the great model!

It's unclear to me what pipeline did you follow. Based on the above message, it looks like you fine-tuned the llama3-instruct model on raw Italian text, but based on the readme it looks like you actually used Italian instruction data. Then, you fine-tuned with DPO on the English dataset. Is this correct or I am missing something? Thanks!

@antoniox2dos you can find this information in the paper https://arxiv.org/abs/2405.07101

In short (copy-pasting from a recent post of mine):

โš™๏ธ The ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ is quite original and interesting
1๏ธโƒฃ Built on ๐Ÿฆ™ Llama-3-8B-Instruct (not a base model)
2๏ธโƒฃ Fine-tuned on a mix of English instruction datasets (100K prompts, Chat-Error/wizard_alpaca_dolly_orca)
3๏ธโƒฃ Direct Preference Optimization on Maxime Labonne's orpo-dpo-mix-40k (a good collection of English preference datasets, mainly by Argilla)
4๏ธโƒฃ ๐Ÿ‡ฎ๐Ÿ‡น Italian Adaptation: further fine-tuning on 100k examples from clean_mc4_it by Gabriele Sarti
๐Ÿ› ๏ธ All training steps utilized QLoRA (Quantized Low-Rank Adaptation) with Unsloth AI and Hugging Face TRL.

Sign up or log in to comment