File size: 2,623 Bytes
be3dbc3
 
 
 
 
 
 
571bbe1
 
be3dbc3
 
9cd37e9
be3dbc3
 
f4d8846
be3dbc3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207d32d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
license: llama3
language:
- pl
- en
- es
- de
base_model: meta-llama/Meta-Llama-3-8B-Instruct
library_name: transformers
---

![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/LjJU76dtpJayC1YtSJVqg.png)

### Intro
We have released a collection of [radlab/pLLama3](https://huggingface.co/collections/radlab/pllama-models-66bf856a4511077b9f99cfee) models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3 models. As part of the collection, we provide models in 8B and 70B architecture. 
We make models in the 8B architecture available in two configurations:
- radlab/pLLama3-8B-creator, a model that gives fairly short, specific answers to user queries;
- radlab/pLLama3-8B-chat, a model that is a chatty version that reflects the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct model.

### Dataset
In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets.
In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors.

### Learning
The learning process was divided into two stages:
- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs.
- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps.

The models we released are the ones after FT and the DPO process.

Post-FT learning metrics:
 - `eval/loss`: `0.8690009713172913`
 - `eval/runtime` :`464.5158`
 - `eval/samples_per_second`: `8.611`
 - `eval/steps_per_second`: `8.611`

Post-DPO learning metrics:
 - `eval/logits/chosen`: `0.1370937079191208`
 - `eval/logits/rejected`: `0.07430506497621536`
 - `eval/logps/chosen`: `-454.11962890625`
 - `eval/logps/rejected`: `-764.1261596679688`
 - `eval/loss`: `0.05717926099896431`
 - `eval/rewards/accuracies`: `0.9372459053993224`
 - `eval/rewards/chosen`: `-26.75682830810547`
 - `eval/rewards/margins`: `32.37759780883789`
 - `eval/rewards/rejected`: `-59.134429931640625`
 - `eval/runtime`: `1,386.3177`
 - `eval/samples_per_second`: `2.838`
 - `eval/steps_per_second`: `1.42`

 ### Outro

Read more in Polish on our [blog](https://radlab.dev/2024/08/16/pllama3-8b-70b-genai-dla-polskiego/). Enjoy!