llama3_8b_odia_v2 / README.md
sam2ai's picture
End of training
34b15e7 verified
metadata
license: other
library_name: peft
tags:
  - axolotl
  - generated_from_trainer
base_model: meta-llama/Meta-Llama-3-8B
model-index:
  - name: llama3_8b_odia_v2
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.4.0

base_model: meta-llama/Meta-Llama-3-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: OdiaGenAIdata/culturax-odia
    type: completion
    field: text
dataset_prepared_path:
val_set_size: 0.1
output_dir: ./llama_3_8b_pretrain_v2
hub_model_id: sam2ai/llama3_8b_odia_v2

adapter: qlora
lora_model_dir:

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
#lora_modules_to_save:
#  - embed_tokens
#  - lm_head
lora_fan_in_fan_out:

wandb_project: llama-3-8b-pretrain-odia-plain
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|end_of_text|>"
save_safetensors: True

llama3_8b_odia_v2

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss
13.7841 0.0007 1 nan
0.0 0.25 384 nan
0.0 0.5 768 nan
0.0 0.75 1152 nan
0.0 1.0 1536 nan
0.0 1.2362 1920 nan
0.0 1.4862 2304 nan
0.0 1.7362 2688 nan
0.0 1.9862 3072 nan
0.0 2.2220 3456 nan
0.0 2.4720 3840 nan
0.0 2.7220 4224 nan
0.0 2.9720 4608 nan
0.0 3.2078 4992 nan
0.0 3.4578 5376 nan
0.0 3.7078 5760 nan
0.0 3.9578 6144 nan

Framework versions

  • PEFT 0.9.0
  • Transformers 4.40.0
  • Pytorch 2.4.0.dev20240326+rocm6.0
  • Datasets 2.15.0
  • Tokenizers 0.19.1