iqwiki-kor
/

Llama3.2-3B-MP-RM

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

nlee-208 commited on Nov 19, 2024

Commit

86a6992

·

verified ·

1 Parent(s): a613ef2

Update README.md

Files changed (1) hide show

README.md +6 -22

README.md CHANGED Viewed

@@ -15,21 +15,15 @@ should probably proofread and complete it, then remove this comment. -->
 # Llama3.2-3B-MP-RM
-This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the iqwiki-kor/MP-86k dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -48,13 +42,3 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1
-### Training results
-### Framework versions
-- Transformers 4.43.4
-- Pytorch 2.4.1+cu124
-- Datasets 2.20.0
-- Tokenizers 0.19.1

 # Llama3.2-3B-MP-RM
+This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the [iqwiki-kor/MP-86k](https://huggingface.co/datasets/iqwiki-kor/MP-86k) dataset.
+## RewardBench Evaluation
+| Model                                                                                               | Chat | Chat-Hard | Safety | Reasoning | Avg. |
+|-----------------------------------------------------------------------------------------------------|---------:|-------:|------:|--------:|--------:|
+| [iqwiki-kor/Llama3.2-3B-MP-RM](https://huggingface.co/iqwiki-kor/Llama3.2-3B-MP-RM/)        |92.5| 81.8| 90.2| 95.5| 90.0|
+| [RLHFlow/ArmoRM-Llama3-8B-v0.1](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)                         |96.9 |76.8 |90.5 |97.3 |90.4|
 ### Training hyperparameters
 - lr_scheduler_warmup_ratio: 0.1
 - num_epochs: 1