tangled-llama-j-128k-v0.1
Train Tokenizer
python -B train_tokenizer.py
Tokenizer training log:
Resolving data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 132/132 [00:00<00:00, 266.56it/s]
Loading dataset shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 18/18 [00:05<00:00, 3.24it/s]
Resolving data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 133/133 [00:00<00:00, 306844.02it/s]
[00:21:52] Pre-processing sequences ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 0 / 0
[00:00:48] Tokenize words ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 25635525 / 25635525
[00:01:17] Count pairs ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 25635525 / 25635525
[00:06:07] Compute merges ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 32066 / 32066
Pretrain
python -B prepare_pretrain_dataset.py
CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain-model.yaml
Chat with Pretrained model
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES="0" litgpt chat out/pretrain/final/
Evaluation
Quick
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES="0" litgpt evaluate --tasks 'hellaswag,gsm8k,truthfulqa_mc2,mmlu,winogrande,arc_challenge' --out_dir 'evaluate-quick/' --batch_size 8 --dtype 'bfloat16' out/pretrain/final/
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------------------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |β |0.1715|Β± |0.0110|
| | |none | 0|acc_norm |β |0.2150|Β± |0.0120|
|gsm8k | 3|flexible-extract| 5|exact_match|β |0.0136|Β± |0.0032|
| | |strict-match | 5|exact_match|β |0.0114|Β± |0.0029|
|hellaswag | 1|none | 0|acc |β |0.2715|Β± |0.0044|
| | |none | 0|acc_norm |β |0.2819|Β± |0.0045|
|mmlu | 2|none | |acc |β |0.2307|Β± |0.0036|
| - humanities | 2|none | |acc |β |0.2436|Β± |0.0063|
| - formal_logic | 1|none | 0|acc |β |0.3175|Β± |0.0416|
| - high_school_european_history | 1|none | 0|acc |β |0.2606|Β± |0.0343|
| - high_school_us_history | 1|none | 0|acc |β |0.2598|Β± |0.0308|
| - high_school_world_history | 1|none | 0|acc |β |0.2700|Β± |0.0289|
| - international_law | 1|none | 0|acc |β |0.2479|Β± |0.0394|
| - jurisprudence | 1|none | 0|acc |β |0.2870|Β± |0.0437|
| - logical_fallacies | 1|none | 0|acc |β |0.2209|Β± |0.0326|
| - moral_disputes | 1|none | 0|acc |β |0.2457|Β± |0.0232|
| - moral_scenarios | 1|none | 0|acc |β |0.2380|Β± |0.0142|
| - philosophy | 1|none | 0|acc |β |0.1833|Β± |0.0220|
| - prehistory | 1|none | 0|acc |β |0.2160|Β± |0.0229|
| - professional_law | 1|none | 0|acc |β |0.2438|Β± |0.0110|
| - world_religions | 1|none | 0|acc |β |0.2924|Β± |0.0349|
| - other | 2|none | |acc |β |0.2385|Β± |0.0076|
| - business_ethics | 1|none | 0|acc |β |0.2900|Β± |0.0456|
| - clinical_knowledge | 1|none | 0|acc |β |0.2075|Β± |0.0250|
| - college_medicine | 1|none | 0|acc |β |0.2139|Β± |0.0313|
| - global_facts | 1|none | 0|acc |β |0.1800|Β± |0.0386|
| - human_aging | 1|none | 0|acc |β |0.3139|Β± |0.0311|
| - management | 1|none | 0|acc |β |0.1748|Β± |0.0376|
| - marketing | 1|none | 0|acc |β |0.2991|Β± |0.0300|
| - medical_genetics | 1|none | 0|acc |β |0.2800|Β± |0.0451|
| - miscellaneous | 1|none | 0|acc |β |0.2363|Β± |0.0152|
| - nutrition | 1|none | 0|acc |β |0.2157|Β± |0.0236|
| - professional_accounting | 1|none | 0|acc |β |0.2376|Β± |0.0254|
| - professional_medicine | 1|none | 0|acc |β |0.1838|Β± |0.0235|
| - virology | 1|none | 0|acc |β |0.2892|Β± |0.0353|
| - social sciences | 2|none | |acc |β |0.2181|Β± |0.0074|
| - econometrics | 1|none | 0|acc |β |0.2368|Β± |0.0400|
| - high_school_geography | 1|none | 0|acc |β |0.1768|Β± |0.0272|
| - high_school_government_and_politics| 1|none | 0|acc |β |0.2073|Β± |0.0293|
| - high_school_macroeconomics | 1|none | 0|acc |β |0.2103|Β± |0.0207|
| - high_school_microeconomics | 1|none | 0|acc |β |0.2101|Β± |0.0265|
| - high_school_psychology | 1|none | 0|acc |β |0.1927|Β± |0.0169|
| - human_sexuality | 1|none | 0|acc |β |0.2519|Β± |0.0381|
| - professional_psychology | 1|none | 0|acc |β |0.2533|Β± |0.0176|
| - public_relations | 1|none | 0|acc |β |0.2182|Β± |0.0396|
| - security_studies | 1|none | 0|acc |β |0.1796|Β± |0.0246|
| - sociology | 1|none | 0|acc |β |0.2438|Β± |0.0304|
| - us_foreign_policy | 1|none | 0|acc |β |0.2700|Β± |0.0446|
| - stem | 2|none | |acc |β |0.2160|Β± |0.0073|
| - abstract_algebra | 1|none | 0|acc |β |0.2000|Β± |0.0402|
| - anatomy | 1|none | 0|acc |β |0.1852|Β± |0.0336|
| - astronomy | 1|none | 0|acc |β |0.1776|Β± |0.0311|
| - college_biology | 1|none | 0|acc |β |0.2569|Β± |0.0365|
| - college_chemistry | 1|none | 0|acc |β |0.1900|Β± |0.0394|
| - college_computer_science | 1|none | 0|acc |β |0.2700|Β± |0.0446|
| - college_mathematics | 1|none | 0|acc |β |0.2200|Β± |0.0416|
| - college_physics | 1|none | 0|acc |β |0.2255|Β± |0.0416|
| - computer_security | 1|none | 0|acc |β |0.3000|Β± |0.0461|
| - conceptual_physics | 1|none | 0|acc |β |0.2638|Β± |0.0288|
| - electrical_engineering | 1|none | 0|acc |β |0.2276|Β± |0.0349|
| - elementary_mathematics | 1|none | 0|acc |β |0.2037|Β± |0.0207|
| - high_school_biology | 1|none | 0|acc |β |0.1903|Β± |0.0223|
| - high_school_chemistry | 1|none | 0|acc |β |0.1823|Β± |0.0272|
| - high_school_computer_science | 1|none | 0|acc |β |0.2500|Β± |0.0435|
| - high_school_mathematics | 1|none | 0|acc |β |0.2259|Β± |0.0255|
| - high_school_physics | 1|none | 0|acc |β |0.2119|Β± |0.0334|
| - high_school_statistics | 1|none | 0|acc |β |0.1574|Β± |0.0248|
| - machine_learning | 1|none | 0|acc |β |0.2768|Β± |0.0425|
|truthfulqa_mc2 | 2|none | 0|acc |β |0.4649|Β± |0.0155|
|winogrande | 1|none | 0|acc |β |0.4988|Β± |0.0141|
| Groups |Version|Filter|n-shot|Metric| |Value | |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu | 2|none | |acc |β |0.2307|Β± |0.0036|
| - humanities | 2|none | |acc |β |0.2436|Β± |0.0063|
| - other | 2|none | |acc |β |0.2385|Β± |0.0076|
| - social sciences| 2|none | |acc |β |0.2181|Β± |0.0074|
| - stem | 2|none | |acc |β |0.2160|Β± |0.0073|
Leaderboard
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES="0" litgpt evaluate --tasks 'leaderboard' --out_dir 'evaluate-leaderboard/' --batch_size 8 --dtype 'bfloat16' out/pretrain/final/
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|-----------------------------------------------------------|-------|------|-----:|-----------------------|---|-----:|---|------|
|leaderboard | N/A| | | | | | | |
| - leaderboard_bbh | N/A| | | | | | | |
| - leaderboard_bbh_boolean_expressions | 1|none | 3|acc_norm |β |0.5080|Β± |0.0317|
| - leaderboard_bbh_causal_judgement | 1|none | 3|acc_norm |β |0.5187|Β± |0.0366|
| - leaderboard_bbh_date_understanding | 1|none | 3|acc_norm |β |0.2000|Β± |0.0253|
| - leaderboard_bbh_disambiguation_qa | 1|none | 3|acc_norm |β |0.3240|Β± |0.0297|
| - leaderboard_bbh_formal_fallacies | 1|none | 3|acc_norm |β |0.5280|Β± |0.0316|
| - leaderboard_bbh_geometric_shapes | 1|none | 3|acc_norm |β |0.2200|Β± |0.0263|
| - leaderboard_bbh_hyperbaton | 1|none | 3|acc_norm |β |0.5160|Β± |0.0317|
| - leaderboard_bbh_logical_deduction_five_objects | 1|none | 3|acc_norm |β |0.1840|Β± |0.0246|
| - leaderboard_bbh_logical_deduction_seven_objects | 1|none | 3|acc_norm |β |0.1480|Β± |0.0225|
| - leaderboard_bbh_logical_deduction_three_objects | 1|none | 3|acc_norm |β |0.3360|Β± |0.0299|
| - leaderboard_bbh_movie_recommendation | 1|none | 3|acc_norm |β |0.2640|Β± |0.0279|
| - leaderboard_bbh_navigate | 1|none | 3|acc_norm |β |0.4200|Β± |0.0313|
| - leaderboard_bbh_object_counting | 1|none | 3|acc_norm |β |0.0680|Β± |0.0160|
| - leaderboard_bbh_penguins_in_a_table | 1|none | 3|acc_norm |β |0.1986|Β± |0.0331|
| - leaderboard_bbh_reasoning_about_colored_objects | 1|none | 3|acc_norm |β |0.1440|Β± |0.0222|
| - leaderboard_bbh_ruin_names | 1|none | 3|acc_norm |β |0.2400|Β± |0.0271|
| - leaderboard_bbh_salient_translation_error_detection | 1|none | 3|acc_norm |β |0.1960|Β± |0.0252|
| - leaderboard_bbh_snarks | 1|none | 3|acc_norm |β |0.5169|Β± |0.0376|
| - leaderboard_bbh_sports_understanding | 1|none | 3|acc_norm |β |0.4600|Β± |0.0316|
| - leaderboard_bbh_temporal_sequences | 1|none | 3|acc_norm |β |0.2720|Β± |0.0282|
| - leaderboard_bbh_tracking_shuffled_objects_five_objects | 1|none | 3|acc_norm |β |0.2080|Β± |0.0257|
| - leaderboard_bbh_tracking_shuffled_objects_seven_objects| 1|none | 3|acc_norm |β |0.1280|Β± |0.0212|
| - leaderboard_bbh_tracking_shuffled_objects_three_objects| 1|none | 3|acc_norm |β |0.3040|Β± |0.0292|
| - leaderboard_bbh_web_of_lies | 1|none | 3|acc_norm |β |0.4880|Β± |0.0317|
| - leaderboard_gpqa | N/A| | | | | | | |
| - leaderboard_gpqa_diamond | 1|none | 0|acc_norm |β |0.2020|Β± |0.0286|
| - leaderboard_gpqa_extended | 1|none | 0|acc_norm |β |0.2637|Β± |0.0189|
| - leaderboard_gpqa_main | 1|none | 0|acc_norm |β |0.2589|Β± |0.0207|
| - leaderboard_ifeval | 3|none | 0|inst_level_loose_acc |β |0.2554|Β± | N/A|
| | |none | 0|inst_level_strict_acc |β |0.2458|Β± | N/A|
| | |none | 0|prompt_level_loose_acc |β |0.1275|Β± |0.0144|
| | |none | 0|prompt_level_strict_acc|β |0.1220|Β± |0.0141|
| - leaderboard_math_hard | N/A| | | | | | | |
| - leaderboard_math_algebra_hard | 1|none | 4|exact_match |β |0.0033|Β± |0.0033|
| - leaderboard_math_counting_and_prob_hard | 1|none | 4|exact_match |β |0.0000|Β± | 0|
| - leaderboard_math_geometry_hard | 1|none | 4|exact_match |β |0.0000|Β± | 0|
| - leaderboard_math_intermediate_algebra_hard | 1|none | 4|exact_match |β |0.0000|Β± | 0|
| - leaderboard_math_num_theory_hard | 1|none | 4|exact_match |β |0.0000|Β± | 0|
| - leaderboard_math_prealgebra_hard | 1|none | 4|exact_match |β |0.0052|Β± |0.0052|
| - leaderboard_math_precalculus_hard | 1|none | 4|exact_match |β |0.0000|Β± | 0|
| - leaderboard_mmlu_pro | 0.1|none | 5|acc |β |0.1074|Β± |0.0028|
| - leaderboard_musr | N/A| | | | | | | |
| - leaderboard_musr_murder_mysteries | 1|none | 0|acc_norm |β |0.4960|Β± |0.0317|
| - leaderboard_musr_object_placements | 1|none | 0|acc_norm |β |0.2266|Β± |0.0262|
| - leaderboard_musr_team_allocation | 1|none | 0|acc_norm |β |0.3800|Β± |0.0308|
- Downloads last month
- 2