Llama-3-70B-Instruct-AH-lora / README.md

Update README.md

e209d7d verified 7 months ago

4.03 kB

	---
	library_name: peft
	base_model: NousResearch/Meta-Llama-3-70B-Instruct
	license: apache-2.0
	---

	# Model Card for radm/Llama-3-70B-Instruct-AH-lora

	<!-- Provide a quick summary of what the model is/does. -->
	This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto)


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	- Developed by: [radm]
	- Model type: [Llama-3-70b]
	- Language(s) (NLP): [English]
	- License: [apache-2.0]
	- Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.

	## Results

	#### Llama-3-70B-Instruct-GPTQ as judge:
	```console
	Llama-3-Instruct-8B-SimPO \| score: 78.3 \| 95% CI: (-1.5, 1.2) \| average #tokens: 545
	SELM-Llama-3-8B-Instruct-iter-3 \| score: 72.8 \| 95% CI: (-2.1, 1.4) \| average #tokens: 606
	Meta-Llama-3-8B-Instruct-f16 \| score: 65.3 \| 95% CI: (-1.8, 2.1) \| average #tokens: 560
	suzume-llama-3-8B-multilingual-orpo-borda-half \| score: 63.5 \| 95% CI: (-1.6, 2.1) \| average #tokens: 978
	Phi-3-medium-128k-instruct \| score: 50.0 \| 95% CI: (0.0, 0.0) \| average #tokens: 801
	suzume-llama-3-8B-multilingual \| score: 48.1 \| 95% CI: (-2.2, 1.8) \| average #tokens: 767
	aya-23-8B \| score: 48.0 \| 95% CI: (-2.0, 2.1) \| average #tokens: 834
	Vikhr-7B-instruct_0.5 \| score: 19.6 \| 95% CI: (-1.3, 1.5) \| average #tokens: 794
	alpindale_gemma-2b-it \| score: 11.2 \| 95% CI: (-1.0, 0.8) \| average #tokens: 425
	```
	#### Llama-3-70B-Instruct-AH-AWQ as judge:
	```console
	Llama-3-Instruct-8B-SimPO \| score: 83.8 \| 95% CI: (-1.4, 1.3) \| average #tokens: 545
	SELM-Llama-3-8B-Instruct-iter-3 \| score: 78.8 \| 95% CI: (-1.7, 1.9) \| average #tokens: 606
	suzume-llama-3-8B-multilingual-orpo-borda-half \| score: 71.8 \| 95% CI: (-1.7, 2.4) \| average #tokens: 978
	Meta-Llama-3-8B-Instruct-f16 \| score: 69.8 \| 95% CI: (-1.9, 1.7) \| average #tokens: 560
	suzume-llama-3-8B-multilingual \| score: 54.0 \| 95% CI: (-2.1, 2.1) \| average #tokens: 767
	aya-23-8B \| score: 50.4 \| 95% CI: (-1.7, 1.7) \| average #tokens: 834
	Phi-3-medium-128k-instruct \| score: 50.0 \| 95% CI: (0.0, 0.0) \| average #tokens: 801
	Vikhr-7B-instruct_0.5 \| score: 14.2 \| 95% CI: (-1.3, 1.0) \| average #tokens: 794
	alpindale_gemma-2b-it \| score: 7.9 \| 95% CI: (-0.9, 0.8) \| average #tokens: 425
	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	Datasets:
	- radm/arenahard_gpt4vsllama3
	- radm/truthy-dpo-v0.1-ru
	- jondurbin/truthy-dpo-v0.1

	#### Training Hyperparameters

	- Training regime: [bf16] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
	- Load in 4 bit: [True]
	- Target modules: [all]
	- LoRA rank: [16]
	- Max seq length: [8192]
	- Use gradient checkpointing: [unsloth]
	- trainer: [ORPOTrainer]
	- Batch size: [1]
	- Gradient accumulation steps: [4]
	- Epochs: [1]

	### Hardware

	- Hardware Type: [Nvidia A100 80 gb]
	- Hours used: [11 hours]

	### Framework versions

	- PEFT 0.10.0

	---
	library_name: peft
	base_model: NousResearch/Meta-Llama-3-70B-Instruct
	license: apache-2.0
	---

	# Model Card for radm/Llama-3-70B-Instruct-AH-lora

	<!-- Provide a quick summary of what the model is/does. -->
	This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto)


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	- Developed by: [radm]
	- Model type: [Llama-3-70b]
	- Language(s) (NLP): [English]
	- License: [apache-2.0]
	- Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.

	## Results

	#### Llama-3-70B-Instruct-GPTQ as judge:
	```console
	Llama-3-Instruct-8B-SimPO \| score: 78.3 \| 95% CI: (-1.5, 1.2) \| average #tokens: 545
	SELM-Llama-3-8B-Instruct-iter-3 \| score: 72.8 \| 95% CI: (-2.1, 1.4) \| average #tokens: 606
	Meta-Llama-3-8B-Instruct-f16 \| score: 65.3 \| 95% CI: (-1.8, 2.1) \| average #tokens: 560
	suzume-llama-3-8B-multilingual-orpo-borda-half \| score: 63.5 \| 95% CI: (-1.6, 2.1) \| average #tokens: 978
	Phi-3-medium-128k-instruct \| score: 50.0 \| 95% CI: (0.0, 0.0) \| average #tokens: 801
	suzume-llama-3-8B-multilingual \| score: 48.1 \| 95% CI: (-2.2, 1.8) \| average #tokens: 767
	aya-23-8B \| score: 48.0 \| 95% CI: (-2.0, 2.1) \| average #tokens: 834
	Vikhr-7B-instruct_0.5 \| score: 19.6 \| 95% CI: (-1.3, 1.5) \| average #tokens: 794
	alpindale_gemma-2b-it \| score: 11.2 \| 95% CI: (-1.0, 0.8) \| average #tokens: 425
	```
	#### Llama-3-70B-Instruct-AH-AWQ as judge:
	```console
	Llama-3-Instruct-8B-SimPO \| score: 83.8 \| 95% CI: (-1.4, 1.3) \| average #tokens: 545
	SELM-Llama-3-8B-Instruct-iter-3 \| score: 78.8 \| 95% CI: (-1.7, 1.9) \| average #tokens: 606
	suzume-llama-3-8B-multilingual-orpo-borda-half \| score: 71.8 \| 95% CI: (-1.7, 2.4) \| average #tokens: 978
	Meta-Llama-3-8B-Instruct-f16 \| score: 69.8 \| 95% CI: (-1.9, 1.7) \| average #tokens: 560
	suzume-llama-3-8B-multilingual \| score: 54.0 \| 95% CI: (-2.1, 2.1) \| average #tokens: 767
	aya-23-8B \| score: 50.4 \| 95% CI: (-1.7, 1.7) \| average #tokens: 834
	Phi-3-medium-128k-instruct \| score: 50.0 \| 95% CI: (0.0, 0.0) \| average #tokens: 801
	Vikhr-7B-instruct_0.5 \| score: 14.2 \| 95% CI: (-1.3, 1.0) \| average #tokens: 794
	alpindale_gemma-2b-it \| score: 7.9 \| 95% CI: (-0.9, 0.8) \| average #tokens: 425
	```

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	Datasets:
	- radm/arenahard_gpt4vsllama3
	- radm/truthy-dpo-v0.1-ru
	- jondurbin/truthy-dpo-v0.1

	#### Training Hyperparameters

	- Training regime: [bf16] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
	- Load in 4 bit: [True]
	- Target modules: [all]
	- LoRA rank: [16]
	- Max seq length: [8192]
	- Use gradient checkpointing: [unsloth]
	- trainer: [ORPOTrainer]
	- Batch size: [1]
	- Gradient accumulation steps: [4]
	- Epochs: [1]

	### Hardware

	- Hardware Type: [Nvidia A100 80 gb]
	- Hours used: [11 hours]

	### Framework versions

	- PEFT 0.10.0