|
--- |
|
license: llama3 |
|
language: |
|
- pl |
|
- en |
|
- es |
|
- de |
|
base_model: meta-llama/Meta-Llama-3-8B-Instruct |
|
library_name: transformers |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/644addfe9279988e0cbc296b/LjJU76dtpJayC1YtSJVqg.png) |
|
|
|
### Intro |
|
We have released a collection of [radlab/pLLama3](https://huggingface.co/collections/radlab/pllama-models-66bf856a4511077b9f99cfee) models, which we have trained into Polish. The trained version is able to communicate more precisely with the user than the base version of meta-llama/Meta-Llama-3 models. As part of the collection, we provide models in 8B and 70B architecture. |
|
We make models in the 8B architecture available in two configurations: |
|
- radlab/pLLama3-8B-creator, a model that gives fairly short, specific answers to user queries; |
|
- radlab/pLLama3-8B-chat, a model that is a chatty version that reflects the behavior of the original meta-llama/Meta-Llama-3-8B-Instruct model. |
|
|
|
### Dataset |
|
In addition to the instruction datasets publicly available for Polish, we developed our own dataset, which contains about 650,000 instructions. This data was semi-automatically generated using other publicly available datasets. |
|
In addition, we developed a learning dataset for the DPO process, which contained 100k examples in which we taught the model to select correctly written versions of texts from those with language errors. |
|
|
|
### Learning |
|
The learning process was divided into two stages: |
|
- Post-training on a set of 650k instructions in Polish, the fine-tuning time was set to 5 epochs. |
|
- After the FT stage, we retrained the model using DPO on 100k instructions of correct writing in Polish, in this case we set the learning time to 15k steps. |
|
|
|
The models we released are the ones after FT and the DPO process. |
|
|
|
Post-FT learning metrics: |
|
- `eval/loss`: `0.8690009713172913` |
|
- `eval/runtime` :`464.5158` |
|
- `eval/samples_per_second`: `8.611` |
|
- `eval/steps_per_second`: `8.611` |
|
|
|
Post-DPO learning metrics: |
|
- `eval/logits/chosen`: `0.1370937079191208` |
|
- `eval/logits/rejected`: `0.07430506497621536` |
|
- `eval/logps/chosen`: `-454.11962890625` |
|
- `eval/logps/rejected`: `-764.1261596679688` |
|
- `eval/loss`: `0.05717926099896431` |
|
- `eval/rewards/accuracies`: `0.9372459053993224` |
|
- `eval/rewards/chosen`: `-26.75682830810547` |
|
- `eval/rewards/margins`: `32.37759780883789` |
|
- `eval/rewards/rejected`: `-59.134429931640625` |
|
- `eval/runtime`: `1,386.3177` |
|
- `eval/samples_per_second`: `2.838` |
|
- `eval/steps_per_second`: `1.42` |
|
|
|
### Outro |
|
|
|
Read more in Polish on our [blog](https://radlab.dev/2024/08/16/pllama3-8b-70b-genai-dla-polskiego/). Enjoy! |