Lighteval documentation

EvaluationTracker

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

EvaluationTracker

class lighteval.logging.evaluation_tracker.EvaluationTracker

< >

( output_dir: str save_details: bool = True push_to_hub: bool = False push_to_tensorboard: bool = False hub_results_org: str | None = '' tensorboard_metric_prefix: str = 'eval' public: bool = False nanotron_run_info: GeneralArgs = None )

Parameters

  • output_dir (str) — Local folder path where you want results to be saved.
  • save_details (bool, defaults to True) — If True, details are saved to the output_dir.
  • push_to_hub (bool, defaults to False) — If True, details are pushed to the hub. Results are pushed to {hub_results_org}/details__{sanitized model_name} for the model model_name, a public dataset, if public is True else {hub_results_org}/details__{sanitized model_name}_private, a private dataset.
  • push_to_tensorboard (bool, defaults to False) — If True, will create and push the results for a tensorboard folder on the hub.
  • hub_results_org (str, optional) — The organisation to push the results to. See more details about the datasets organisation in EvaluationTracker.save.
  • tensorboard_metric_prefix (str, defaults to “eval”) — Prefix for the metrics in the tensorboard logs.
  • public (bool, defaults to False) — If True, results and details are pushed to public orgs.
  • nanotron_run_info (~nanotron.config.GeneralArgs, optional) — Reference to information about Nanotron models runs.

Keeps track of the overall evaluation process and relevant information.

The EvaluationTracker contains specific loggers for experiments details (DetailsLogger), metrics (MetricsLogger), task versions (VersionsLogger) as well as for the general configurations of both the specific task (TaskConfigLogger) and overall evaluation run (GeneralConfigLogger). It compiles the data from these loggers and writes it to files, which can be published to the Hugging Face hub if requested.

Attributes:

generate_final_dict

< >

( )

Aggregates and returns all the logger’s experiment information in a dictionary.

This function should be used to gather and display said information at the end of an evaluation run.

push_to_hub

< >

( date_id: str details: dict results_dict: dict )

Pushes the experiment details (all the model predictions for every step) to the hub.

recreate_metadata_card

< >

( repo_id: str )

Parameters

  • repo_id (str) — Details dataset repository path on the hub (org/dataset)

Fully updates the details repository metadata card for the currently evaluated model

save

< >

( )

Saves the experiment information and results to files, and to the hub if requested.

< > Update on GitHub