Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ Subsequently the Nemotron-4-340B-Instruct model went through additional alignmen
|
|
19 |
|
20 |
- Supervised Fine-tuning (SFT)
|
21 |
- Direct Policy Optimization (DPO)
|
22 |
-
- Additional in-house alignment techniques
|
23 |
|
24 |
This results in a final model that is aligned for human chat preferences, improvements in mathematical reasoning, coding and instruction following.
|
25 |
|
|
|
19 |
|
20 |
- Supervised Fine-tuning (SFT)
|
21 |
- Direct Policy Optimization (DPO)
|
22 |
+
- Additional in-house alignment techniques (Publication work in progress)
|
23 |
|
24 |
This results in a final model that is aligned for human chat preferences, improvements in mathematical reasoning, coding and instruction following.
|
25 |
|