|
1289.4068 seconds used for training. |
|
21.49 minutes used for training. |
|
Peak reserved memory = 9.545 GB. |
|
Peak reserved memory for training = 4.018 GB. |
|
Peak reserved memory % of max memory = 43.058 %. |
|
Peak reserved memory for training % of max memory = 18.125 %. |
|
|
|
args = TrainingArguments( |
|
per_device_train_batch_size = 2, |
|
gradient_accumulation_steps = 4, |
|
warmup_steps = 10, # Augmenté le nombre de steps de warmup |
|
max_steps = 200, # Augmenté le nombre total de steps |
|
learning_rate = 1e-4, # Réduit le taux d'apprentissage |
|
fp16 = not torch.cuda.is_bf16_supported(), |
|
bf16 = torch.cuda.is_bf16_supported(), |
|
logging_steps = 1, |
|
optim = "adamw_8bit", |
|
weight_decay = 0.01, |
|
lr_scheduler_type = "linear", |
|
seed = 42, |
|
output_dir = "outputs", |
|
|
|
|
|
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 |
|
\\ /| Num examples = 399 | Num Epochs = 4 |
|
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 |
|
\ / Total batch size = 8 | Total steps = 200 |
|
"-____-" Number of trainable parameters = 20,971,520 |
|
[200/200 21:17, Epoch 4/4] |
|
Step Training Loss |
|
1 2.027900 |
|
2 2.008700 |
|
3 1.946100 |
|
4 1.924700 |
|
5 1.995000 |
|
6 1.999000 |
|
7 1.870100 |
|
8 1.891400 |
|
9 1.807600 |
|
10 1.723200 |
|
11 1.665100 |
|
12 1.541000 |
|
13 1.509100 |
|
14 1.416600 |
|
15 1.398600 |
|
16 1.233200 |
|
17 1.172100 |
|
18 1.272100 |
|
19 1.146000 |
|
20 1.179000 |
|
21 1.206400 |
|
22 1.095400 |
|
23 0.937300 |
|
24 1.214300 |
|
25 1.040200 |
|
26 1.183400 |
|
27 1.033900 |
|
28 0.953100 |
|
29 0.935700 |
|
30 0.962200 |
|
31 0.908900 |
|
32 0.924900 |
|
33 0.931000 |
|
34 1.011300 |
|
35 0.951900 |
|
36 0.936000 |
|
37 0.903000 |
|
38 0.906900 |
|
39 0.945700 |
|
40 0.827000 |
|
41 0.931800 |
|
42 0.919600 |
|
43 0.926900 |
|
44 0.932900 |
|
45 0.872700 |
|
46 0.795200 |
|
47 0.888700 |
|
48 0.956800 |
|
49 1.004200 |
|
50 0.859500 |
|
51 0.802500 |
|
52 0.855400 |
|
53 0.885500 |
|
54 1.026600 |
|
55 0.844100 |
|
56 0.879800 |
|
57 0.797400 |
|
58 0.885300 |
|
59 0.842800 |
|
60 0.861600 |
|
61 0.789100 |
|
62 0.861600 |
|
63 0.856700 |
|
64 0.929200 |
|
65 0.782500 |
|
66 0.713600 |
|
67 0.781000 |
|
68 0.765100 |
|
69 0.784700 |
|
70 0.869500 |
|
71 0.742900 |
|
72 0.787900 |
|
73 0.750800 |
|
74 0.931700 |
|
75 0.713000 |
|
76 0.832100 |
|
77 0.928300 |
|
78 0.777600 |
|
79 0.694000 |
|
80 0.835400 |
|
81 0.822000 |
|
82 0.754600 |
|
83 0.813400 |
|
84 0.868800 |
|
85 0.732400 |
|
86 0.803700 |
|
87 0.694400 |
|
88 0.771300 |
|
89 0.864400 |
|
90 0.646700 |
|
91 0.690800 |
|
92 0.695000 |
|
93 0.732300 |
|
94 0.766900 |
|
95 0.864100 |
|
96 0.867200 |
|
97 0.774300 |
|
98 0.797700 |
|
99 0.772100 |
|
100 0.906700 |
|
101 0.693400 |
|
102 0.685500 |
|
103 0.712200 |
|
104 0.678400 |
|
105 0.761900 |
|
106 0.705300 |
|
107 0.775700 |
|
108 0.627600 |
|
109 0.599300 |
|
110 0.615100 |
|
111 0.618200 |
|
112 0.668700 |
|
113 0.699900 |
|
114 0.577000 |
|
115 0.711600 |
|
116 0.692900 |
|
117 0.585400 |
|
118 0.646400 |
|
119 0.569200 |
|
120 0.752300 |
|
121 0.745000 |
|
122 0.690100 |
|
123 0.744700 |
|
124 0.665800 |
|
125 0.866100 |
|
126 0.707400 |
|
127 0.679300 |
|
128 0.591400 |
|
129 0.655100 |
|
130 0.734000 |
|
131 0.637900 |
|
132 0.733900 |
|
133 0.652500 |
|
134 0.685400 |
|
135 0.641300 |
|
136 0.608200 |
|
137 0.754100 |
|
138 0.753700 |
|
139 0.671000 |
|
140 0.767200 |
|
141 0.668700 |
|
142 0.630300 |
|
143 0.734700 |
|
144 0.767700 |
|
145 0.722200 |
|
146 0.694400 |
|
147 0.710100 |
|
148 0.696300 |
|
149 0.612600 |
|
150 0.670400 |
|
151 0.512900 |
|
152 0.675100 |
|
153 0.579900 |
|
154 0.622900 |
|
155 0.652500 |
|
156 0.649200 |
|
157 0.546700 |
|
158 0.521600 |
|
159 0.522200 |
|
160 0.589400 |
|
161 0.552600 |
|
162 0.630700 |
|
163 0.595600 |
|
164 0.614300 |
|
165 0.489400 |
|
166 0.634500 |
|
167 0.620800 |
|
168 0.618600 |
|
169 0.637900 |
|
170 0.553900 |
|
171 0.656000 |
|
172 0.644000 |
|
173 0.694300 |
|
174 0.608900 |
|
175 0.673000 |
|
176 0.612500 |
|
177 0.654200 |
|
178 0.639200 |
|
179 0.599100 |
|
180 0.642100 |
|
181 0.529700 |
|
182 0.614000 |
|
183 0.582900 |
|
184 0.765100 |
|
185 0.502700 |
|
186 0.564300 |
|
187 0.740200 |
|
188 0.636100 |
|
189 0.638800 |
|
190 0.560100 |
|
191 0.620000 |
|
192 0.712800 |
|
193 0.531000 |
|
194 0.591600 |
|
195 0.608600 |
|
196 0.671800 |
|
197 0.572900 |
|
198 0.600900 |
|
199 0.586800 |
|
200 0.545900 |
|
|
|
--- |
|
base_model: unsloth/llama-3-8b-bnb-4bit |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- gguf |
|
--- |
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** Mathoufle13 |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** unsloth/llama-3-8b-bnb-4bit |
|
|
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
|
|