EvaluationTracker
class lighteval.logging.evaluation_tracker.EvaluationTracker
< source >( output_dir: str save_details: bool = True push_to_hub: bool = False push_to_tensorboard: bool = False hub_results_org: str | None = '' tensorboard_metric_prefix: str = 'eval' public: bool = False nanotron_run_info: GeneralArgs = None )
Parameters
- output_dir (
str
) — Local folder path where you want results to be saved. - save_details (
bool
, defaults to True) — If True, details are saved to theoutput_dir
. - push_to_hub (
bool
, defaults to False) — If True, details are pushed to the hub. Results are pushed to{hub_results_org}/details__{sanitized model_name}
for the modelmodel_name
, a public dataset, ifpublic
is True else{hub_results_org}/details__{sanitized model_name}_private
, a private dataset. - push_to_tensorboard (
bool
, defaults to False) — If True, will create and push the results for a tensorboard folder on the hub. - hub_results_org (
str
, optional) — The organisation to push the results to. See more details about the datasets organisation inEvaluationTracker.save
. - tensorboard_metric_prefix (
str
, defaults to “eval”) — Prefix for the metrics in the tensorboard logs. - public (
bool
, defaults to False) — If True, results and details are pushed to public orgs. - nanotron_run_info (
~nanotron.config.GeneralArgs
, optional) — Reference to information about Nanotron models runs.
Keeps track of the overall evaluation process and relevant information.
The EvaluationTracker contains specific loggers for experiments details (DetailsLogger), metrics (MetricsLogger), task versions (VersionsLogger) as well as for the general configurations of both the specific task (TaskConfigLogger) and overall evaluation run (GeneralConfigLogger). It compiles the data from these loggers and writes it to files, which can be published to the Hugging Face hub if requested.
Attributes:
- details_logger (DetailsLogger) — Logger for experiment details.
- metrics_logger (MetricsLogger) — Logger for experiment metrics.
- versions_logger (VersionsLogger) — Logger for task versions.
- general_config_logger (GeneralConfigLogger) — Logger for general configuration.
- task_config_logger (TaskConfigLogger) — Logger for task configuration.
Aggregates and returns all the logger’s experiment information in a dictionary.
This function should be used to gather and display said information at the end of an evaluation run.
Pushes the experiment details (all the model predictions for every step) to the hub.
recreate_metadata_card
< source >( repo_id: str )
Fully updates the details repository metadata card for the currently evaluated model
Saves the experiment information and results to files, and to the hub if requested.