Spaces:
Runtime error
A newer version of the Streamlit SDK is available:
1.41.1
title: Model Evaluator
emoji: π
colorFrom: red
colorTo: red
sdk: streamlit
sdk_version: 1.10.0
app_file: app.py
Model Evaluator
Submit evaluation jobs to AutoTrain from the Hugging Face Hub
β οΈ This project has been archived. If you want to evaluate LLMs, checkout this collection of leaderboards.
Supported tasks
The table below shows which tasks are currently supported for evaluation in the AutoTrain backend:
Task | Supported |
---|---|
binary_classification |
β |
multi_class_classification |
β |
multi_label_classification |
β |
entity_extraction |
β |
extractive_question_answering |
β |
translation |
β |
summarization |
β |
image_binary_classification |
β |
image_multi_class_classification |
β |
text_zero_shot_evaluation |
β |
Installation
To run the application locally, first clone this repository and install the dependencies as follows:
pip install -r requirements.txt
Next, copy the example file of environment variables:
cp .env.template .env
and set the HF_TOKEN
variable with a valid API token from the autoevaluator
bot user. Finally, spin up the application by running:
streamlit run app.py
Usage
Evaluation on the Hub involves two main steps:
- Submitting an evaluation job via the UI. This creates an AutoTrain project with
N
models for evaluation. At this stage, the dataset is also processed and prepared for evaluation. - Triggering the evaluation itself once the dataset is processed.
From the user perspective, only step (1) is needed since step (2) is handled by a cron job on GitHub Actions that executes the run_evaluation_jobs.py
script every 15 minutes.
See below for details on manually triggering evaluation jobs.
Triggering an evaluation
To evaluate the models in an AutoTrain project, run:
python run_evaluation_jobs.py
This will download the autoevaluate/evaluation-job-logs
dataset from the Hub and check which evaluation projects are ready for evaluation (i.e. those whose dataset has been processed).
AutoTrain configuration details
Models are evaluated by the autoevaluator
bot user in AutoTrain, with the payload sent to the AUTOTRAIN_BACKEND_API
environment variable. Evaluation projects are created and run on either the prod
or staging
environments. You can view the status of projects in the AutoTrain UI by navigating to one of the links below (ask internally for access to the staging UI):
AutoTrain environment | AutoTrain UI URL | AUTOTRAIN_BACKEND_API |
---|---|---|
prod |
https://ui.autotrain.huggingface.co/projects |
https://api.autotrain.huggingface.co |
staging |
https://ui-staging.autotrain.huggingface.co/projects |
https://api-staging.autotrain.huggingface.co |
The current configuration for evaluation jobs running on Spaces is:
AUTOTRAIN_BACKEND_API=https://api.autotrain.huggingface.co
To evaluate models with a local instance of AutoTrain, change the environment to:
AUTOTRAIN_BACKEND_API=http://localhost:8000
Migrating from staging to production (and vice versa)
In general, evaluation jobs should run in AutoTrain's prod
environment, which is defined by the following environment variable:
AUTOTRAIN_BACKEND_API=https://api.autotrain.huggingface.co
However, there are times when it is necessary to run evaluation jobs in AutoTrain's staging
environment (e.g. because a new evaluation pipeline is being deployed). In these cases the corresponding environement variable is:
AUTOTRAIN_BACKEND_API=https://api-staging.autotrain.huggingface.co
To migrate between these two environments, update the AUTOTRAIN_BACKEND_API
in two places:
- In the repo secrets associated with the
model-evaluator
Space. This will ensure evaluation projects are created in the desired environment. - In the GitHub Actions secrets associated with this repo. This will ensure that the correct evaluation jobs are approved and launched via the
run_evaluation_jobs.py
script.