pythia-160m-dolphin-extended / README.md

Update README.md

43ce4d2 verified 4 months ago

10.7 kB

	---
	base_model: EleutherAI/pythia-160m-deduped
	library_name: transformers
	license: apache-2.0
	tags:
	- axolotl
	- relora
	- generated_from_trainer
	model-index:
	- name: pythia-160m-dolphin-extended
	results: []
	datasets:
	- cognitivecomputations/dolphin
	- llamafactory/alpaca_gpt4_en
	language:
	- en
	metrics:
	- accuracy
	- bleu
	- rouge
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	base_model: EleutherAI/pythia-160m-deduped
	load_in_8bit:
	datasets:
	- path: vicgalle/alpaca-gpt4
	type: alpaca
	- path: llamafactory/alpaca_gpt4_en
	type: alpaca
	- path: cognitivecomputations/dolphin
	name: flan1m-alpaca-uncensored
	type: alpaca
	shards: 10

	dataset_prepared_path: ds-mega-alpaca
	#dataset_shard_num: 10
	chat_template: inst
	val_set_size: 0.001
	adapter: lora
	lora_model_dir:
	sequence_len: 2048
	lora_r: 16
	lora_alpha: 16
	lora_dropout: 0.05
	lora_target_modules:
	- query_key_value
	lora_target_linear:
	lora_fan_in_fan_out: true # pythia/GPTNeoX lora specific
	lora_modules_to_save:
	- embed_in
	- embed_out
	- lm_head
	lora_on_cpu: false
	# ReLoRA configuration
	# # Must use either 'lora' or 'qlora' adapter, and does not support fsdp or deepspeed
	# relora_steps: # Number of steps per ReLoRA restart
	# relora_warmup_steps: # Number of per-restart warmup steps
	# relora_anneal_steps: # Number of anneal steps for each relora cycle
	# relora_prune_ratio: # threshold for optimizer magnitude when pruning
	# relora_cpu_offload: # True to perform lora weight merges on cpu during restarts, for modest gpu memory savings
	relora_steps: 600
	relora_warmup_steps: 10
	relora_cpu_offload: true
	wandb_project: pythia
	wandb_entity:
	wandb_watch:
	wandb_name: pythia-160m-dolphin-extended
	wandb_log_model:
	output_dir: ./outputs/lora-alpaca-pythia-160m-dolphin-extended
	gradient_accumulation_steps: 16
	micro_batch_size: 1
	num_epochs: 1
	learning_rate: 0.0004
	lr_scheduler: cosine_with_restarts
	#cosine_min_lr_ratio: 0.1
	train_on_inputs: false
	group_by_length: false
	#bf16: auto
	#fp16: true
	#tf32: false
	float16: true
	flash_attn:
	xformers_attention: true
	optimizer: paged_adamw_8bit
	gpu_memory_limit: 8GiB
	hub_model_id: jtatman/pythia-160m-dolphin-extended
	early_stopping_patience: 10
	#resume_from_checkpoint: outputs/lora-alpaca-pythia-160m-dolphin-extended/checkpoint-11400
	auto_resume_from_checkpoints: true
	local_rank:
	weight_decay: 0.0
	#evals_per_epoch: 4
	eval_steps: 200
	logging_steps: 1
	save_steps: 200
	save_total_limit: 5
	warmup_steps: 100
	tokens:
	- "[INST]"
	- "[/INST]"

	```

	</details><br>

	# pythia-160m-dolphin-extended

	This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 6.6729

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0004
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine_with_restarts
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|
	\| 25.9906 \| 0.0001 \| 1 \| 29.5342 \|
	\| 21.1303 \| 0.0167 \| 200 \| 20.2350 \|
	\| 16.5026 \| 0.0334 \| 400 \| 18.4930 \|
	\| 17.2725 \| 0.0500 \| 600 \| 16.3395 \|
	\| 11.9697 \| 0.0667 \| 800 \| 12.1401 \|
	\| 11.3783 \| 0.0834 \| 1000 \| 11.8383 \|
	\| 12.8084 \| 0.1001 \| 1200 \| 12.9667 \|
	\| 9.4119 \| 0.1167 \| 1400 \| 9.8787 \|
	\| 10.3527 \| 0.1334 \| 1600 \| 10.0560 \|
	\| 9.3545 \| 0.1501 \| 1800 \| 9.7355 \|
	\| 8.9165 \| 0.1668 \| 2000 \| 9.1513 \|
	\| 8.5467 \| 0.1835 \| 2200 \| 8.2025 \|
	\| 7.9152 \| 0.2001 \| 2400 \| 7.6616 \|
	\| 7.3362 \| 0.2168 \| 2600 \| 7.5699 \|
	\| 7.9374 \| 0.2335 \| 2800 \| 7.4818 \|
	\| 7.838 \| 0.2502 \| 3000 \| 7.4635 \|
	\| 7.5731 \| 0.2668 \| 3200 \| 7.4899 \|
	\| 7.8289 \| 0.2835 \| 3400 \| 7.3594 \|
	\| 7.8906 \| 0.3002 \| 3600 \| 8.0934 \|
	\| 7.7318 \| 0.3169 \| 3800 \| 7.5812 \|
	\| 7.2089 \| 0.3335 \| 4000 \| 7.4839 \|
	\| 7.202 \| 0.3502 \| 4200 \| 7.4486 \|
	\| 6.9493 \| 0.3669 \| 4400 \| 7.3208 \|
	\| 7.1492 \| 0.3836 \| 4600 \| 7.2469 \|
	\| 7.3443 \| 0.4003 \| 4800 \| 7.1378 \|
	\| 7.7056 \| 0.4169 \| 5000 \| 7.1385 \|
	\| 55.0553 \| 0.4336 \| 5200 \| 50.0135 \|
	\| 7.1868 \| 0.4503 \| 5400 \| 6.9898 \|
	\| 6.5803 \| 0.4670 \| 5600 \| 6.9559 \|
	\| 8.6171 \| 0.4836 \| 5800 \| 7.9075 \|
	\| 7.1373 \| 0.5003 \| 6000 \| 6.9280 \|
	\| 6.7077 \| 0.5170 \| 6200 \| 6.8797 \|
	\| 7.0026 \| 0.5337 \| 6400 \| 6.8635 \|
	\| 6.6797 \| 0.5504 \| 6600 \| 6.8178 \|
	\| 6.8067 \| 0.5670 \| 6800 \| 6.7893 \|
	\| 6.5979 \| 0.5837 \| 7000 \| 6.8106 \|
	\| 6.7283 \| 0.6004 \| 7200 \| 6.7998 \|
	\| 7.0015 \| 0.6171 \| 7400 \| 6.7705 \|
	\| 6.1182 \| 0.6337 \| 7600 \| 6.7592 \|
	\| 6.7919 \| 0.6504 \| 7800 \| 6.7446 \|
	\| 6.4523 \| 0.6671 \| 8000 \| 6.7260 \|
	\| 6.765 \| 0.6838 \| 8200 \| 6.7135 \|
	\| 6.4625 \| 0.7004 \| 8400 \| 6.7099 \|
	\| 6.79 \| 0.7171 \| 8600 \| 6.7070 \|
	\| 6.6101 \| 0.7338 \| 8800 \| 6.7017 \|
	\| 6.7541 \| 0.7505 \| 9000 \| 6.6964 \|
	\| 6.7777 \| 0.7672 \| 9200 \| 6.6901 \|
	\| 7.2082 \| 0.7838 \| 9400 \| 6.6869 \|
	\| 6.4263 \| 0.8005 \| 9600 \| 6.6875 \|
	\| 6.1944 \| 0.8172 \| 9800 \| 6.6803 \|
	\| 6.7745 \| 0.8339 \| 10000 \| 6.6865 \|
	\| 6.6746 \| 0.8505 \| 10200 \| 6.6756 \|
	\| 6.6319 \| 0.8672 \| 10400 \| 6.6941 \|
	\| 6.6657 \| 0.8839 \| 10600 \| 6.6764 \|
	\| 6.8516 \| 0.9006 \| 10800 \| 6.6776 \|
	\| 6.6391 \| 0.9173 \| 11000 \| 6.6749 \|
	\| 6.5763 \| 0.9339 \| 11200 \| 6.6729 \|
	\| 6.585 \| 0.9506 \| 11400 \| 6.6694 \|
	\| 6.2999 \| 0.9673 \| 11600 \| 6.6722 \|
	\| 6.8343 \| 0.9840 \| 11800 \| 6.6729 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1

	### Evaluation Results
	\| Groups \|Version\| Filter \|n-shot\| Metric \| Value \| \|Stderr\|
	\|--------------------\|-------\|----------------\|-----:\|-----------\|------:\|---\|-----:\|
	\|Open LLM Leaderboard\|N/A \|none \| 5\|rouge2_max \|16.4873\|± \|1.0172\|
	\| - winogrande \| 1\|none \| 5\|acc \| 0.5120\|± \|0.0224\|
	\| - gsm8k \| 3\|strict-match \| 5\|exact_match\| 0.0060\|± \|0.0035\|
	\| - hellaswag \| 1\|none \| 10\|acc \| 0.3520\|± \|0.0214\|
	\| - mmlu \|N/A \|none \| 0\|acc \| 0.2533\|± \|0.0039\|
	\| \| \|none \| 5\|rouge2_acc \| 0.1920\|± \|0.0176\|
	\| \| \|none \| 5\|rougeL_acc \| 0.3860\|± \|0.0218\|
	\| \| \|flexible-extract\| 5\|exact_match\| 0.0220\|± \|0.0066\|
	\| \| \|strict-match \| 5\|exact_match\| 0.0060\|± \|0.0035\|
	\| \| \|none \| 5\|rougeL_diff\|-0.7765\|± \|1.0034\|
	\| \| \|none \| 5\|rouge1_acc \| 0.3700\|± \|0.0216\|
	\| \| \|none \| 5\|rouge1_diff\|-1.5564\|± \|1.0223\|
	\| \| \|none \| 5\|acc_norm \| 0.3180\|± \|0.0145\|
	\| \| \|none \| 5\|bleu_diff \|-0.6500\|± \|0.6421\|
	\| \| \|none \| 5\|rouge1_max \|36.3550\|± \|0.9462\|
	\| \| \|none \| 5\|acc \| 0.2664\|± \|0.0036\|
	\| \| \|none \| 5\|rougeL_max \|33.8798\|± \|0.9367\|
	\| \| \|none \| 5\|bleu_max \|15.2292\|± \|0.6714\|
	\| \| \|none \| 5\|bleu_acc \| 0.4360\|± \|0.0222\|
	\| \| \|none \| 5\|rouge2_diff\|-3.3178\|± \|0.9477\|
	\| - mmlu \|N/A \|none \| 0\|acc \| 0.2533\|± \|0.0039\|
	\| - humanities \|N/A \|none \| 5\|acc \| 0.2408\|± \|0.0075\|
	\| - other \|N/A \|none \| 5\|acc \| 0.2443\|± \|0.0080\|
	\| - social_sciences \|N/A \|none \| 5\|acc \| 0.2538\|± \|0.0081\|
	\| - stem \|N/A \|none \| 5\|acc \| 0.2740\|± \|0.0079\|
	\| - truthfulqa \|N/A \|none \| 0\|rouge2_max \|16.4873\|± \|1.0172\|
	\| \| \|none \| 0\|rouge2_acc \| 0.1920\|± \|0.0176\|
	\| \| \|none \| 0\|rougeL_acc \| 0.3860\|± \|0.0218\|
	\| \| \|none \| 0\|rougeL_diff\|-0.7765\|± \|1.0034\|
	\| \| \|none \| 0\|rouge1_acc \| 0.3700\|± \|0.0216\|
	\| \| \|none \| 0\|rouge1_diff\|-1.5564\|± \|1.0223\|
	\| \| \|none \| 0\|bleu_diff \|-0.6500\|± \|0.6421\|
	\| \| \|none \| 0\|rouge1_max \|36.3550\|± \|0.9462\|
	\| \| \|none \| 0\|acc \| 0.3435\|± \|0.0137\|
	\| \| \|none \| 0\|rougeL_max \|33.8798\|± \|0.9367\|
	\| \| \|none \| 0\|bleu_max \|15.2292\|± \|0.6714\|
	\| \| \|none \| 0\|bleu_acc \| 0.4360\|± \|0.0222\|
	\| \| \|none \| 0\|rouge2_diff\|-3.3178\|± \|0.9477\|