Update README.md
Browse files
README.md
CHANGED
@@ -24,12 +24,16 @@ We use a curated subset of Open Assistant 2 and translated the dataset into Finn
|
|
24 |
|
25 |
### DPO
|
26 |
|
|
|
|
|
27 |
- **English**: [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2)
|
28 |
|
29 |
- **Finnish**: TBA
|
30 |
|
31 |
## Recipes
|
32 |
|
|
|
|
|
33 |
**SFT**
|
34 |
|
35 |
```
|
|
|
24 |
|
25 |
### DPO
|
26 |
|
27 |
+
We use the HelpSteer2 preference binarized into chosen-rejected pairs using the helpfulness score as discussed in the [HelpSteer2](https://arxiv.org/abs/2406.08673) paper. We translated the dataset into Finnish using Poro.
|
28 |
+
|
29 |
- **English**: [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2)
|
30 |
|
31 |
- **Finnish**: TBA
|
32 |
|
33 |
## Recipes
|
34 |
|
35 |
+
We used 4 nodes (8 x AMD MI250X) to obtain a global batch size of 128 for SFT and 64 for DPO.
|
36 |
+
|
37 |
**SFT**
|
38 |
|
39 |
```
|