meta-metrics
/

MetaMetrics-RM-v1.0

Model card Files Files and versions Community

MetaMetrics-RM-v1.0 / README.md

gentaiscool's picture

Update README.md

68a530c verified about 2 months ago

|

history blame contribute delete

1.74 kB

	---
	datasets:
	- natolambert/skywork-preferences-80k-v0.1-cleaned
	- allenai/preference-test-sets
	---

	# MetaMetrics-RM-v1.0

	+ Authors [Genta Indra Winata](https://gentawinata.com/), [David Anugraha](https://davidanugraha.github.io/), [Lucky Susanto](https://luckysusanto.github.io/), [Garry Kuwanto](https://gkuwanto.github.io/), [Derry Tanti Wijaya](https://derrywijaya.github.io/)
	+ Paper: https://arxiv.org/abs/2410.02381
	+ Model: [meta-metrics/MetaMetrics-RM-v1.0](https://huggingface.co/meta-metrics/MetaMetrics-RM-v1.0)
	+ Dataset:
	- [natolambert/skywork-preferences-80k-v0.1-cleaned](https://huggingface.co/datasets/natolambert/skywork-preferences-80k-v0.1-cleaned)
	- [allenai/preference-test-sets](https://huggingface.co/datasets/allenai/preference-test-sets)
	+ Code Repository: https://github.com/meta-metrics/metametrics

	## RewardBench LeaderBoard

	\| Model \| Score \| Chat \| Chat Hard \| Safety \| Reasoning \|
	\|:-------\|:------\|:-----\|:----------\|:-------\|:----------\|
	\| nvidia/Llama-3.1-Nemotron-70B-Reward \| 94.1 \| 97.5 \| 85.7 \| 95.1 \| 98.1 \|
	\| meta-metrics/MetaMetrics-RM-v1.0 \| 93.5 \| 98.9 \| 86.2 \| 90.7 \| 98.2 \|
	\| SF-Foundation/TextEval-Llama3.1-70B \| 93.5 \| 94.1 \| 90.1 \| 93.2 \| 96.4 \|
	\| RLHFlow/ArmoRM-Llama3-8B-v0.1 \| 90.4 \| 96.9 \| 76.8 \| 90.5 \| 97.3 \|

	## Citation

	If you find this work useful for your research, please consider citing:
	```
	@article{winata2024metametrics,
	title={MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences},
	author={Winata, Genta Indra and Anugraha, David and Susanto, Lucky and Kuwanto, Garry and Wijaya, Derry Tanti},
	journal={arXiv preprint arXiv:2410.02381},
	year={2024}
	}
	```