Youliang commited on
Commit
0c266cb
1 Parent(s): d706cfa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -2,18 +2,18 @@
2
  library_name: peft
3
  tags:
4
  - generated_from_trainer
5
- base_model: meta-llama/Meta-Llama-3-8B-Instruct
6
  model-index:
7
- - name: lora_Meta-Llama-3-70B_llama_wizardllm_sorry_3k_safe_3k_66k_diverse_must_refuse_purely_harmful_declar
8
  results: []
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
- # lora_Meta-Llama-3-70B_llama_wizardllm_sorry_3k_safe_3k_66k_diverse_must_refuse_purely_harmful_declar
15
 
16
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the None dataset.
17
 
18
  ## Model description
19
 
@@ -33,13 +33,13 @@ More information needed
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0001
36
- - train_batch_size: 10
37
  - eval_batch_size: 1
38
  - seed: 1
39
  - distributed_type: multi-GPU
40
- - num_devices: 6
41
  - gradient_accumulation_steps: 2
42
- - total_train_batch_size: 120
43
  - total_eval_batch_size: 6
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
  - lr_scheduler_type: cosine
 
2
  library_name: peft
3
  tags:
4
  - generated_from_trainer
5
+ base_model: meta-llama/Meta-Llama-3-70B
6
  model-index:
7
+ - name: lora_Meta-Llama-3-70B_derta
8
  results: []
9
  ---
10
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # lora_Meta-Llama-3-70B_derta
15
 
16
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on the Evol-Instruct and BeaverTails dataset.
17
 
18
  ## Model description
19
 
 
33
 
34
  The following hyperparameters were used during training:
35
  - learning_rate: 0.0001
36
+ - train_batch_size: 16
37
  - eval_batch_size: 1
38
  - seed: 1
39
  - distributed_type: multi-GPU
40
+ - num_devices: 8
41
  - gradient_accumulation_steps: 2
42
+ - total_train_batch_size: 128
43
  - total_eval_batch_size: 6
44
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
  - lr_scheduler_type: cosine