Spaces:
Sleeping
Sleeping
metadata
title: Lingo Judge Metric
emoji: 🐨
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.19.1
app_file: app.py
pinned: false
tags:
- evaluate
- metric
Metric Description
Lingo-Judge is an evaluation metric that aligns closely with human judgement on the LingoQA evaluation suite.
See the project's README at LingoQA for more information.
How to use
This metric requires questions, predictions and references as inputs.
>>> metric = evaluate.load("maysonma/lingo_judge_metric")
>>> questions = ["Are there any traffic lights present? If yes, what is their color?"]
>>> references = [["Yes, green."]]
>>> predictions = ["No."]
>>> results = metric.compute(questions=questions, predictions=predictions, references=references)
>>> print(results)
[-3.38348388671875]
Inputs
- questions (
list
ofstr
): Input questions. - predictions (
list
ofstr
): Model predictions. - references (
list
oflist
ofstr
): Multiple references per question.
Output Values
- scores (
list
offloat
): Score indicating truthfulness.
Citation
@article{marcu2023lingoqa,
title={LingoQA: Video Question Answering for Autonomous Driving},
author={Ana-Maria Marcu and Long Chen and Jan Hünermann and Alice Karnsund and Benoit Hanotte and Prajwal Chidananda and Saurabh Nair and Vijay Badrinarayanan and Alex Kendall and Jamie Shotton and Oleg Sinavski},
journal={arXiv preprint arXiv:2312.14115},
year={2023},
}