Text2Text Generation
Transformers
Safetensors
English
mixtral
text-generation
Inference Endpoints
text-generation-inference

Is it possible/advisable to run direct evaluation without a reference answer?

#1
by afg1 - opened

I have a use case where I'm developing some literature summaries for genes. Part of what I'm trying to do is extend summary coverage to things which have not got any summary of any kind, and do it at reasonably large scale.

I have an assessment rubric which was used by human raters to rate a small subset, but I would like to be able to run a much larger set automatically. I originally looked at prometheus 1 for it, but the need for a reference answer made me think it was probably not a good fit. I see in the prometheus-eval library that there is a prompt template without reference answer so I guess I can run it without one.

I don't think I saw in the paper a comparison between direct rating with/without a reference - do you have a feel for how much it would impact the metrics?

Also, I can get a 5* answer written for a few examples. Would it be preferable to run with an example summary that is not related to the summary under test, or that going to confuse things?

Thank you for the awesome paper and models!

Sign up or log in to comment