erbacher commited on
Commit
2942428
·
verified ·
1 Parent(s): 6590d81

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLaMA 1B Tulu-3 Finetuned Model
2
+ ## Model Description
3
+ A 1B parameter LLaMA model fully finetuned on the Tulu-3 dataset from AllenAI. This model builds upon Meta's LLaMA-3.2-1B architecture and incorporates instruction-following capabilities through the Tulu-3 training mixture.
4
+
5
+
6
+ ### Base Model Name:
7
+ meta-llama/Llama-3.2-1B
8
+
9
+
10
+
11
+ ### Dataset:
12
+ allenai/tulu-3-sft-mixture
13
+
14
+ ### Hardware
15
+ 4x NVIDIA A100 80GB GPUs
16
+
17
+
18
+ ### Training Configuration
19
+ ```yaml
20
+ --model_name_or_path meta-llama/Llama-3.2-1B \
21
+ --dataset_name "allenai/tulu-3-sft-mixture" \
22
+ --learning_rate 1.0e-5 \
23
+ --lr_scheduler_type linear \
24
+ --warmup_ratio 0.03 \
25
+ --weight_decay 0.0 \
26
+ --num_train_epochs 2 \
27
+ --per_device_train_batch_size 8 \
28
+ --gradient_accumulation_steps 2 \
29
+ --gradient_checkpointing \
30
+ --logging_steps 25 \
31
+ --bf16 \
32
+ --eval_strategy steps \
33
+ --eval_steps 5000
34
+ ```