Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# LLaMA 1B Tulu-3 Finetuned Model
|
2 |
+
## Model Description
|
3 |
+
A 1B parameter LLaMA model fully finetuned on the Tulu-3 dataset from AllenAI. This model builds upon Meta's LLaMA-3.2-1B architecture and incorporates instruction-following capabilities through the Tulu-3 training mixture.
|
4 |
+
|
5 |
+
|
6 |
+
### Base Model Name:
|
7 |
+
meta-llama/Llama-3.2-1B
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
### Dataset:
|
12 |
+
allenai/tulu-3-sft-mixture
|
13 |
+
|
14 |
+
### Hardware
|
15 |
+
4x NVIDIA A100 80GB GPUs
|
16 |
+
|
17 |
+
|
18 |
+
### Training Configuration
|
19 |
+
```yaml
|
20 |
+
--model_name_or_path meta-llama/Llama-3.2-1B \
|
21 |
+
--dataset_name "allenai/tulu-3-sft-mixture" \
|
22 |
+
--learning_rate 1.0e-5 \
|
23 |
+
--lr_scheduler_type linear \
|
24 |
+
--warmup_ratio 0.03 \
|
25 |
+
--weight_decay 0.0 \
|
26 |
+
--num_train_epochs 2 \
|
27 |
+
--per_device_train_batch_size 8 \
|
28 |
+
--gradient_accumulation_steps 2 \
|
29 |
+
--gradient_checkpointing \
|
30 |
+
--logging_steps 25 \
|
31 |
+
--bf16 \
|
32 |
+
--eval_strategy steps \
|
33 |
+
--eval_steps 5000
|
34 |
+
```
|