mistral-7b-base-dpo-run / README.md

NobodyExistsOnTheInternet

End of training

07c881d verified 7 months ago

preview code

raw

history blame contribute delete

No virus

4.82 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- axolotl
	- dpo
	- trl
	- dpo
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.1
	model-index:
	- name: mistral-7b-base-dpo-run
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: mistralai/Mistral-7B-v0.1
	base_model_ignore_patterns: []
	base_model_config: mistralai/Mistral-7B-v0.1
	model_revision:
	tokenizer_config:
	model_type: AutoModelForCausalLM
	tokenizer_type: AutoTokenizer
	trust_remote_code: true
	tokenizer_use_fast: true
	tokenizer_legacy: true
	resize_token_embeddings_to_32x: false

	is_falcon_derived_model: false
	is_llama_derived_model: false
	is_mistral_derived_model: true
	is_qwen_derived_model: false

	model_config:
	rope_scaling:

	bnb_config_kwargs:

	gptq: false
	gptq_groupsize:
	gptq_model_v1: false

	load_in_8bit: false
	load_in_4bit: true

	fp16: true

	lora_on_cpu: false

	rl: dpo
	datasets:
	- path: NobodyExistsOnTheInternet/Fixed-gutenberg-dpo-v0.1
	split: train
	type: chatml.intel
	- path: NobodyExistsOnTheInternet/Fixed-Distilabel-intel-orca-dpo-pairs
	split: train
	type: chatml.intel
	- path: NobodyExistsOnTheInternet/ToxicDPOqa
	split: train
	type: chatml.intel
	- path: NobodyExistsOnTheInternet/system-message-DPO
	split: train
	type: chatml.intel
	- path: NobodyExistsOnTheInternet/alpaca-intel-data-dpo
	split: train
	type: chatml.intel
	- path: NobodyExistsOnTheInternet/ToxicDPOqa
	split: train
	type: chatml.intel


	chat_template: chatml
	default_system_message: Generate a preferable answer.
	dataset_prepared_path: data/last_run_prepared
	push_dataset_to_hub:
	dataset_processes:
	dataset_keep_in_memory:
	hub_model_id: NobodyExistsOnTheInternet/mistral-7b-base-dpo-run
	hub_strategy: every_save
	hf_use_auth_token: true
	val_set_size: 0
	dataset_shard_num:
	dataset_shard_idx:

	sequence_len: 1024
	sample_packing: false
	eval_sample_packing:
	sample_packing_eff_est:
	total_num_tokens:

	device_map:
	max_memory:

	adapter: qlora
	lora_model_dir:

	lora_r: 32
	lora_alpha: 64
	lora_dropout: 0.05
	lora_target_linear: true
	lora_target_module:

	lora_modules_to_save:
	- embed_tokens
	- lm_head
	lora_fan_in_fan_out:

	wandb_project: dpo-hermes-2.5
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_run_id:
	wandb_log_model:

	mlflow_tracking_uri:
	mlflow_experiment_name:

	output_dir: ./completed-model

	torch_compile: true
	gradient_accumulation_steps: 4
	micro_batch_size: 1
	eval_batch_size:
	num_epochs: 2
	warmup_steps: 100
	warmup_ratio:
	learning_rate: 0.000001
	lr_quadratic_warmup:
	logging_steps:
	eval_steps:
	evals_per_epoch:
	save_strategy: steps
	save_steps: 1000
	saves_per_epoch:
	save_total_limit:
	eval_table_size:
	eval_max_new_tokens:
	eval_causal_lm_metrics:

	loss_watchdog_threshold:
	loss_watchwatchdog_patience:

	train_on_inputs: false
	group_by_length: false

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false

	lr_scheduler:

	optimizer: paged_adamw_8bit
	weight_decay: 0.01
	adam_beta1: 0.95
	adam_beta2: 0.999
	adam_epsilon: 0.0000001

	neftune_noise_alpha: 5

	flash_optimum:
	xformers_attention:
	flash_attention: true
	flash_attn_cross_entropy:
	flash_attn_rms_norm:
	flash_attn_fuse_qkv:
	flash_attn_fuse_mlp:
	sdp_attention:
	s2_attention:
	resume_from_checkpoint:
	auto_resume_from_checkpoints: false

	local_rank:

	tokens:

	fsdp:
	fsdp_config:

	deepspeed:

	ddp_timeout:
	ddp_bucket_cap_mb:
	ddp_broadcast_buffers:

	torchdistx_path:

	pretraining_dataset:

	debug:

	seed:

	```

	</details><br>

	# mistral-7b-base-dpo-run

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the None dataset.

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-06
	- train_batch_size: 1
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.95,0.999) and epsilon=1e-07
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 15031

	### Training results



	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.0
	- Pytorch 2.2.0+cu121
	- Datasets 2.16.1
	- Tokenizers 0.15.0