radm
/

Llama-3-70B-Instruct-AH-lora

Model card Files Files and versions Community

radm commited on Jun 4

Commit

e209d7d

•

1 Parent(s): 2525d11

Update README.md

Files changed (1) hide show

README.md +24 -26

README.md CHANGED Viewed

@@ -26,31 +26,7 @@ This is a LORA adapter for NousResearch/Meta-Llama-3-70B-Instruct, fine-tuned to
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-Datasets:
-- radm/arenahard_gpt4vsllama3
-- radm/truthy-dpo-v0.1-ru
-- jondurbin/truthy-dpo-v0.1
-#### Training Hyperparameters
-- **Training regime:** [bf16] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-- **Load in 4 bit:** [True]
-- **Target modules:** [all]
-- **LoRA rank:** [16]
-- **Max seq length:** [8192]
-- **Use gradient checkpointing:** [unsloth]
-- **trainer:** [ORPOTrainer]
-- **Batch size:** [1]
-- **Gradient accumulation steps:** [4]
-- **Epochs:** [1]
-### Results
 #### Llama-3-70B-Instruct-GPTQ as judge:
 ```console
@@ -77,8 +53,30 @@ Vikhr-7B-instruct_0.5                              | score: 14.2  | 95% CI:   (-
 alpindale_gemma-2b-it                              | score:  7.9  | 95% CI:   (-0.9, 0.8)   | average #tokens: 425
 ```
-## Hardware
 - **Hardware Type:** [Nvidia A100 80 gb]
 - **Hours used:** [11 hours]

 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.
+## Results
 #### Llama-3-70B-Instruct-GPTQ as judge:
 ```console
 alpindale_gemma-2b-it                              | score:  7.9  | 95% CI:   (-0.9, 0.8)   | average #tokens: 425
 ```
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+Datasets:
+- radm/arenahard_gpt4vsllama3
+- radm/truthy-dpo-v0.1-ru
+- jondurbin/truthy-dpo-v0.1
+#### Training Hyperparameters
+- **Training regime:** [bf16] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+- **Load in 4 bit:** [True]
+- **Target modules:** [all]
+- **LoRA rank:** [16]
+- **Max seq length:** [8192]
+- **Use gradient checkpointing:** [unsloth]
+- **trainer:** [ORPOTrainer]
+- **Batch size:** [1]
+- **Gradient accumulation steps:** [4]
+- **Epochs:** [1]
+### Hardware
 - **Hardware Type:** [Nvidia A100 80 gb]
 - **Hours used:** [11 hours]