radm
/

Llama-3-70B-Instruct-AH-lora

Model card Files Files and versions Community

Llama-3-70B-Instruct-AH-lora / README.md

radm's picture

Update README.md

410f08f verified 7 months ago

|

1.81 kB

	---
	library_name: peft
	base_model: NousResearch/Meta-Llama-3-70B-Instruct
	license: apache-2.0
	---

	# Model Card for radm/Llama-3-70B-Instruct-AH-lora

	<!-- Provide a quick summary of what the model is/does. -->
	This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto)


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->
	- Developed by: [radm]
	- Model type: [Llama-3-70b]
	- Language(s) (NLP): [English]
	- License: [apache-2.0]
	- Finetuned from model [optional]: [NousResearch/Meta-Llama-3-70B-Instruct]

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	[More Information Needed]

	## Training Details

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
	Datasets:
	- radm/arenahard_gpt4vsllama3
	- radm/truthy-dpo-v0.1-ru
	- jondurbin/truthy-dpo-v0.1

	#### Training Hyperparameters

	- Training regime: [bf16] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
	- Load in 4 bit: [True]
	- Target modules: [all]
	- LoRA rank: [16]
	- Max seq length: [8192]
	- Use gradient checkpointing: [unsloth]
	- trainer: [ORPOTrainer]
	- Batch size: [1]
	- Gradient accumulation steps: [4]
	- Epochs: [1]

	### Results

	[More Information Needed]



	## Hardware

	- Hardware Type: [Nvidia A100 80 gb]
	- Hours used: [11 hours]

	### Framework versions

	- PEFT 0.10.0