See axolotl config

axolotl version: 0.4.1

base_model: microsoft/Phi-3.5-mini-instruct
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: phi_3

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: flydust/CodeGen_snippets_1130_20037_correct
    type: chat_template
    field_messages: conversations
    # The key in the message turn that contains the role. Default is "role".
    message_field_role: from
    # The key in the message turn that contains the content. Default is "content".
    message_field_content: value
    # Optional[Dict[str, List]]. Roles mapping for the messages.
    roles:
      user: ["human", "user"]
      assistant: ["gpt", "assistant", "ai"]
      system: ["system"]


dataset_prepared_path: last_run_prepared
val_set_size: 0.001
output_dir: axolotl_out/Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

wandb_project: CodeGen
wandb_entity:
wandb_watch:
wandb_name: Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct
wandb_log_model:
hub_model_id: flydust/Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct

gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
# Disable flash attention
flash_attention: true
# sdp_attention: falses
# eager_attention: true

warmup_ratio: 0.1
evals_per_epoch: 10
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct

This model is a fine-tuned version of microsoft/Phi-3.5-mini-instruct on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0841

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 59
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss
0.3186	0.0034	1	0.2093
0.1593	0.1006	30	0.1003
0.1449	0.2012	60	0.0912
0.1277	0.3018	90	0.0879
0.1453	0.4023	120	0.0873
0.1468	0.5029	150	0.0861
0.1397	0.6035	180	0.0857
0.1499	0.7041	210	0.0845
0.1568	0.8047	240	0.0840
0.1369	0.9053	270	0.0843
0.1214	1.0042	300	0.0840
0.1315	1.1048	330	0.0846
0.1336	1.2054	360	0.0844
0.1114	1.3060	390	0.0844
0.1314	1.4065	420	0.0846
0.1232	1.5071	450	0.0840
0.1454	1.6077	480	0.0834
0.1376	1.7083	510	0.0843
0.1301	1.8089	540	0.0842
0.0966	1.9095	570	0.0841

Framework versions

Transformers 4.45.2
Pytorch 2.5.1+cu124
Datasets 3.0.1
Tokenizers 0.20.3

flydust
/

Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct

Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for flydust/Phi-3.5-mini-instruct-CodeGen_snippets_1130_20037_correct

Evaluation results