pszemraj's picture
Update README.md
34866dd verified
|
raw
history blame
3.01 kB
metadata
license: apache-2.0
base_model: pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k-knowledge-inoc-concat-v1-vN
    results: []

jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k-knowledge-inoc-concat-v1-vN

This model is a fine-tuned version of pszemraj/jamba-H1024_L12-v0.12-fineweb-100k-xlong_16k on the BEE-spoke-data/knowledge-inoc-concat-v1 dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0366
  • Accuracy: 0.4514
  • Num Input Tokens Seen: 1975517184

Quick eval

Quick eval for: pszemraj/jamba-H1024_L12-v0.13-KIx2

bootstrapping for stddev: perplexity hf (pretrained=pszemraj/jamba-H1024_L12-v0.13-KIx2,trust_remote_code=True,dtype=float), gen_kwargs: (None), limit: 0.9999, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
winogrande 1 none 0 acc 0.5067 ± 0.0141
piqa 1 none 0 acc 0.5912 ± 0.0138
none 0 acc_norm 0.5951 ± 0.0138
openbookqa 1 none 0 acc 0.1800 ± 0.0172
none 0 acc_norm 0.2920 ± 0.0204
lambada_openai 1 none 0 perplexity 103.1241 ± 8.5843
none 0 acc 0.2502 ± 0.0122
boolq 2 none 0 acc 0.6196 ± 0.0136
arc_easy 1 none 0 acc 0.3836 ± 0.0137
none 0 acc_norm 0.3694 ± 0.0136

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 80085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
3.2013 0.4241 200 3.0653 0.4479 419430400
3.1976 0.8481 400 3.0434 0.4506 838860800
3.1485 1.2722 600 3.0375 0.4513 1258291200
3.1871 1.6963 800 3.0366 0.4514 1677721600

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1