Llama-3-6.3b-v0.1 / README.md
pszemraj's picture
Adding Evaluation Results (#2)
cf5ca47 verified
|
raw
history blame
6.85 kB
---
language:
- en
license: llama3
tags:
- axolotl
base_model: meta-llama/Meta-Llama-3-8B
datasets:
- BEE-spoke-data/KI-smorgasbord_fw-small
pipeline_tag: text-generation
model-index:
- name: Llama-3-6.3b-v0.1
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 10.44
name: strict accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 18.68
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 1.51
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 4.47
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 6.15
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 20.44
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pszemraj/Llama-3-6.3b-v0.1
name: Open LLM Leaderboard
---
# Llama-3-6.3b-v0.1
This is a layer pruning experiment based off of the original llama-3-8b:
- 8 layers pruned with [PruneMe](https://github.com/pszemraj/PruneMe/tree/upgrades)/MergeKit
- layers selected using [BEE-spoke-data/fineweb-100k_en-med](https://hf.co/datasets/BEE-spoke-data/fineweb-100k_en-med)
- brief subsequent continued pretraining @ ctx 4096
- data: 10k rows of FineWeb (different than pruning data) + some curated data
- wandb [here](https://wandb.ai/pszemraj/llama3-pruning)
## quick eval
hf (pretrained=pszemraj/Llama-3-6.3b-v0.1,trust_remote_code=True,dtype=bfloat16), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|------:|------|-----:|----------|-----:|---|-----:|
|arc_easy | 1|none | 0|acc |0.7109|± |0.0093|
| | |none | 0|acc_norm |0.6843|± |0.0095|
|boolq | 2|none | 0|acc |0.7920|± |0.0071|
|lambada_openai| 1|none | 0|perplexity|4.5411|± |0.1073|
| | |none | 0|acc |0.6734|± |0.0065|
|openbookqa | 1|none | 0|acc |0.3000|± |0.0205|
| | |none | 0|acc_norm |0.4140|± |0.0220|
|piqa | 1|none | 0|acc |0.7443|± |0.0102|
| | |none | 0|acc_norm |0.7530|± |0.0101|
|winogrande | 1|none | 0|acc |0.7127|± |0.0127|
## Details
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
<details><summary>See axolotl config</summary>
axolotl version: `0.4.0`
```yaml
base_model: pszemraj/llama-3-prune_8
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
strict: false
seed: 80085
# dataset
datasets:
- path: BEE-spoke-data/KI-smorgasbord_fw-small
type: completion # format from earlier
field: text # Optional[str] default: text, field to use for completion data
val_set_size: 0.015
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false
# WANDB
wandb_project: llama3-pruning
wandb_entity: pszemraj
wandb_watch: gradients
wandb_name: Llama-3-6.3b-v0.1
hub_model_id: pszemraj/Llama-3-6.3b-v0.1
hub_strategy: every_save
gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_torch_fused # paged_adamw_32bit
weight_decay: 0.05
lr_scheduler: cosine
learning_rate: 4e-5
warmup_ratio: 0.1
load_in_8bit: false
load_in_4bit: false
bfloat16: true
tf32: true
flash_attention: true
torch_compile: true # requires >= torch 2.0, may sometimes cause problems
torch_compile_backend: inductor # Optional[str]
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
# hyperparams for freq of evals, saving, etc
evals_per_epoch: 5
saves_per_epoch: 3
save_safetensors: true
save_total_limit: 1
output_dir: ./output-axolotl/output-model-6.3b
logging_steps: 8
deepspeed:
special_tokens:
pad_token: <|end_of_text|>
```
</details><br>
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:------:|:----:|:---------------:|
| No log | 0.0006 | 1 | 7.8100 |
| 2.2782 | 0.2002 | 320 | 2.3728 |
| 2.2699 | 0.4004 | 640 | 2.3265 |
| 2.3761 | 0.6006 | 960 | 2.2849 |
| 2.2448 | 0.8008 | 1280 | 2.2702 |
---
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__Llama-3-6.3b-v0.1)
| Metric |Value|
|-------------------|----:|
|Avg. |10.28|
|IFEval (0-Shot) |10.44|
|BBH (3-Shot) |18.68|
|MATH Lvl 5 (4-Shot)| 1.51|
|GPQA (0-shot) | 4.47|
|MuSR (0-shot) | 6.15|
|MMLU-PRO (5-shot) |20.44|