Translation
COMET
zouharvi commited on
Commit
5b8c534
verified
1 Parent(s): 97bbf81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -4
README.md CHANGED
@@ -100,11 +100,61 @@ base_model:
100
  - FacebookAI/xlm-roberta-large
101
  ---
102
 
103
- # PreCOMET-var
104
 
105
  This is a source-only COMET model used for efficient evaluation subset selection.
106
- It is not compatible with the upstream [github.com/Unbabel/COMET/](https://github.com/Unbabel/COMET/) and to run it you have to install [github.com/zouharvi/PreCOMET](https://github.com/zouharvi/PreCOMET)
 
 
 
 
 
107
 
108
- The primary use of this model is from the [subset2evaluate](https://github.com/zouharvi/subset2evaluate) package.
 
 
 
 
 
 
 
 
 
109
 
110
- Further description TODO.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  - FacebookAI/xlm-roberta-large
101
  ---
102
 
103
+ # PreCOMET-var [![Paper](https://img.shields.io/badge/馃摐%20paper-481.svg)](https://arxiv.org/abs/2501.18251)
104
 
105
  This is a source-only COMET model used for efficient evaluation subset selection.
106
+ Specifically this model predicts expected variance in human scores in translations. Trained on direct assessment scores from up to WMT2022.
107
+ The higher the scores, the better it is for evaluation because it will likely distinguish between systems.
108
+ It is not compatible with the original Unbabel's COMET and to run it you have to install [github.com/zouharvi/PreCOMET](https://github.com/zouharvi/PreCOMET):
109
+ ```bash
110
+ pip install pip3 install git+https://github.com/zouharvi/PreCOMET.git
111
+ ```
112
 
113
+ You can then use it in Python:
114
+ ```python
115
+ import precomet
116
+ model = precomet.load_from_checkpoint(precomet.download_model("zouharvi/PreCOMET-var"))
117
+ model.predict([
118
+ {"src": "This is an easy source sentence."},
119
+ {"src": "this is a much more complicated source sen-tence that will pro路bably lead to loww scores 馃お"}
120
+ ])["scores"]
121
+ > [70.99381256103516, 70.99385833740234]
122
+ ```
123
 
124
+ The primary use of this model is from the [subset2evaluate](https://github.com/zouharvi/subset2evaluate) package:
125
+
126
+ ```python
127
+ import subset2evaluate
128
+
129
+ data_full = subset2evaluate.utils.load_data("wmt23/en-cs")
130
+ data_random = subset2evaluate.select_subset.basic(data_full, method="random")
131
+ subset2evaluate.evaluate.eval_subset_clusters(data_random[:100])
132
+ > 1
133
+ subset2evaluate.evaluate.eval_subset_correlation(data_random[:100], data_full)
134
+ > 0.71
135
+ ```
136
+ Random selection gives us only one cluster and system-level Spearman correlation of 0.71 when we have a budget for only 100 segments. However, by using this model:
137
+ ```python
138
+ data_precomet = subset2evaluate.select_subset.basic(data_full, method="precomet_var")
139
+ subset2evaluate.evaluate.eval_subset_clusters(data_precomet[:100])
140
+ > 2
141
+ subset2evaluate.evaluate.eval_subset_correlation(data_precomet[:100], data_full)
142
+ > 0.92
143
+ ```
144
+ we get higher correlation and number of clusters.
145
+ However, you can expect a bigger effect on a larger scale, as described in the paper.
146
+
147
+
148
+ This work is described in [How to Select Datapoints for Efficient Human Evaluation of NLG Models?](https://arxiv.org/abs/2501.18251).
149
+ Cite as:
150
+ ```
151
+ @misc{zouhar2025selectdatapointsefficienthuman,
152
+ title={How to Select Datapoints for Efficient Human Evaluation of NLG Models?},
153
+ author={Vil茅m Zouhar and Peng Cui and Mrinmaya Sachan},
154
+ year={2025},
155
+ eprint={2501.18251},
156
+ archivePrefix={arXiv},
157
+ primaryClass={cs.CL},
158
+ url={https://arxiv.org/abs/2501.18251},
159
+ }
160
+ ```