open-llama-Instruct

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 4
  • total_eval_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.3632 ± 0.0040
- humanities 2 none acc ↑ 0.3411 ± 0.0068
- other 2 none acc ↑ 0.4078 ± 0.0087
- social sciences 2 none acc ↑ 0.3997 ± 0.0087
- stem 2 none acc ↑ 0.3165 ± 0.0082

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.1.2
  • Datasets 3.0.1
  • Tokenizers 0.20.1

Built with Axolotl

See axolotl config

axolotl version: 0.4.1


base_model: meta-llama/Llama-3.2-1B-Instruct

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: diabolic6045/OpenHermes-2.5_alpaca_10
    type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./outputs/out
hub_model_id: diabolic6045/open-llama-Instruct
hf_use_auth_token: true

sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true

wandb_project: open-llama
wandb_entity: 
wandb_watch: all
wandb_name: open-llama
wandb_log_model: 

gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 1

optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

Downloads last month
367
Safetensors
Model size
1.5B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for diabolic6045/open-llama-3.2-1B-Instruct

Finetuned
(179)
this model
Quantizations
2 models

Dataset used to train diabolic6045/open-llama-3.2-1B-Instruct

Space using diabolic6045/open-llama-3.2-1B-Instruct 1