Lighteval documentation

Available Tasks

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Available Tasks

You can get a list of all the available tasks by running:

lighteval tasks list

You can also inspect a specific task by running:

lighteval tasks inspect <task_name>

List of tasks

  • bigbench:

    • bigbench|abstract_narrative_understanding
    • bigbench|anachronisms
    • bigbench|analogical_similarity
    • bigbench|analytic_entailment
    • bigbench|arithmetic_bb
    • bigbench|ascii_word_recognition
    • bigbench|authorship_verification
    • bigbench|auto_categorization
    • bigbench|auto_debugging
    • bigbench|bbq_lite_json
    • bigbench|bridging_anaphora_resolution_barqa
    • bigbench|causal_judgment
    • bigbench|cause_and_effect
    • bigbench|checkmate_in_one
    • bigbench|chess_state_tracking
    • bigbench|chinese_remainder_theorem
    • bigbench|cifar10_classification
    • bigbench|code_line_description
    • bigbench|codenames
    • bigbench|color
    • bigbench|common_morpheme
    • bigbench|conceptual_combinations
    • bigbench|conlang_translation
    • bigbench|contextual_parametric_knowledge_conflicts
    • bigbench|coqa_bb
    • bigbench|crash_blossom
    • bigbench|crass_ai
    • bigbench|cryobiology_spanish
    • bigbench|cryptonite
    • bigbench|cs_algorithms
    • bigbench|dark_humor_detection
    • bigbench|date_understanding
    • bigbench|disambiguation_qa
    • bigbench|discourse_marker_prediction
    • bigbench|disfl_qa
    • bigbench|dyck_languages
    • bigbench|elementary_math_qa
    • bigbench|emoji_movie
    • bigbench|emojis_emotion_prediction
    • bigbench|empirical_judgments
    • bigbench|english_proverbs
    • bigbench|english_russian_proverbs
    • bigbench|entailed_polarity
    • bigbench|entailed_polarity_hindi
    • bigbench|epistemic_reasoning
    • bigbench|evaluating_information_essentiality
    • bigbench|fact_checker
    • bigbench|fantasy_reasoning
    • bigbench|few_shot_nlg
    • bigbench|figure_of_speech_detection
    • bigbench|formal_fallacies_syllogisms_negation
    • bigbench|gem
    • bigbench|gender_inclusive_sentences_german
    • bigbench|general_knowledge
    • bigbench|geometric_shapes
    • bigbench|goal_step_wikihow
    • bigbench|gre_reading_comprehension
    • bigbench|hhh_alignment
    • bigbench|hindi_question_answering
    • bigbench|hindu_knowledge
    • bigbench|hinglish_toxicity
    • bigbench|human_organs_senses
    • bigbench|hyperbaton
    • bigbench|identify_math_theorems
    • bigbench|identify_odd_metaphor
    • bigbench|implicatures
    • bigbench|implicit_relations
    • bigbench|intent_recognition
    • bigbench|international_phonetic_alphabet_nli
    • bigbench|international_phonetic_alphabet_transliterate
    • bigbench|intersect_geometry
    • bigbench|irony_identification
    • bigbench|kanji_ascii
    • bigbench|kannada
    • bigbench|key_value_maps
    • bigbench|known_unknowns
    • bigbench|language_games
    • bigbench|language_identification
    • bigbench|linguistic_mappings
    • bigbench|linguistics_puzzles
    • bigbench|logic_grid_puzzle
    • bigbench|logical_args
    • bigbench|logical_deduction
    • bigbench|logical_fallacy_detection
    • bigbench|logical_sequence
    • bigbench|mathematical_induction
    • bigbench|matrixshapes
    • bigbench|metaphor_boolean
    • bigbench|metaphor_understanding
    • bigbench|minute_mysteries_qa
    • bigbench|misconceptions
    • bigbench|misconceptions_russian
    • bigbench|mnist_ascii
    • bigbench|modified_arithmetic
    • bigbench|moral_permissibility
    • bigbench|movie_dialog_same_or_different
    • bigbench|movie_recommendation
    • bigbench|mult_data_wrangling
    • bigbench|multiemo
    • bigbench|natural_instructions
    • bigbench|navigate
    • bigbench|nonsense_words_grammar
    • bigbench|novel_concepts
    • bigbench|object_counting
    • bigbench|odd_one_out
    • bigbench|operators
    • bigbench|paragraph_segmentation
    • bigbench|parsinlu_qa
    • bigbench|parsinlu_reading_comprehension
    • bigbench|penguins_in_a_table
    • bigbench|periodic_elements
    • bigbench|persian_idioms
    • bigbench|phrase_relatedness
    • bigbench|physical_intuition
    • bigbench|physics
    • bigbench|physics_questions
    • bigbench|play_dialog_same_or_different
    • bigbench|polish_sequence_labeling
    • bigbench|presuppositions_as_nli
    • bigbench|qa_wikidata
    • bigbench|question_selection
    • bigbench|real_or_fake_text
    • bigbench|reasoning_about_colored_objects
    • bigbench|repeat_copy_logic
    • bigbench|rephrase
    • bigbench|rhyming
    • bigbench|riddle_sense
    • bigbench|ruin_names
    • bigbench|salient_translation_error_detection
    • bigbench|scientific_press_release
    • bigbench|semantic_parsing_in_context_sparc
    • bigbench|semantic_parsing_spider
    • bigbench|sentence_ambiguity
    • bigbench|similarities_abstraction
    • bigbench|simp_turing_concept
    • bigbench|simple_arithmetic_json
    • bigbench|simple_arithmetic_json_multiple_choice
    • bigbench|simple_arithmetic_json_subtasks
    • bigbench|simple_arithmetic_multiple_targets_json
    • bigbench|simple_ethical_questions
    • bigbench|simple_text_editing
    • bigbench|snarks
    • bigbench|social_iqa
    • bigbench|social_support
    • bigbench|sports_understanding
    • bigbench|strange_stories
    • bigbench|strategyqa
    • bigbench|sufficient_information
    • bigbench|suicide_risk
    • bigbench|swahili_english_proverbs
    • bigbench|swedish_to_german_proverbs
    • bigbench|symbol_interpretation
    • bigbench|tellmewhy
    • bigbench|temporal_sequences
    • bigbench|tense
    • bigbench|timedial
    • bigbench|topical_chat
    • bigbench|tracking_shuffled_objects
    • bigbench|understanding_fables
    • bigbench|undo_permutation
    • bigbench|unit_conversion
    • bigbench|unit_interpretation
    • bigbench|unnatural_in_context_learning
    • bigbench|vitaminc_fact_verification
    • bigbench|what_is_the_tao
    • bigbench|which_wiki_edit
    • bigbench|wino_x_german
    • bigbench|winowhy
    • bigbench|word_sorting
    • bigbench|word_unscrambling
  • harness:

    • harness|bbh:boolean_expressions
    • harness|bbh:causal_judgment
    • harness|bbh:date_understanding
    • harness|bbh:disambiguation_qa
    • harness|bbh:dyck_languages
    • harness|bbh:formal_fallacies
    • harness|bbh:geometric_shapes
    • harness|bbh:hyperbaton
    • harness|bbh:logical_deduction_five_objects
    • harness|bbh:logical_deduction_seven_objects
    • harness|bbh:logical_deduction_three_objects
    • harness|bbh:movie_recommendation
    • harness|bbh:multistep_arithmetic_two
    • harness|bbh:navigate
    • harness|bbh:object_counting
    • harness|bbh:penguins_in_a_table
    • harness|bbh:reasoning_about_colored_objects
    • harness|bbh:ruin_names
    • harness|bbh:salient_translation_error_detection
    • harness|bbh:snarks
    • harness|bbh:sports_understanding
    • harness|bbh:temporal_sequences
    • harness|bbh:tracking_shuffled_objects_five_objects
    • harness|bbh:tracking_shuffled_objects_seven_objects
    • harness|bbh:tracking_shuffled_objects_three_objects
    • harness|bbh:web_of_lies
    • harness|bbh:word_sorting
    • harness|bigbench:causal_judgment
    • harness|bigbench:date_understanding
    • harness|bigbench:disambiguation_qa
    • harness|bigbench:geometric_shapes
    • harness|bigbench:logical_deduction_five_objects
    • harness|bigbench:logical_deduction_seven_objects
    • harness|bigbench:logical_deduction_three_objects
    • harness|bigbench:movie_recommendation
    • harness|bigbench:navigate
    • harness|bigbench:reasoning_about_colored_objects
    • harness|bigbench:ruin_names
    • harness|bigbench:salient_translation_error_detection
    • harness|bigbench:snarks
    • harness|bigbench:sports_understanding
    • harness|bigbench:temporal_sequences
    • harness|bigbench:tracking_shuffled_objects_five_objects
    • harness|bigbench:tracking_shuffled_objects_seven_objects
    • harness|bigbench:tracking_shuffled_objects_three_objects
    • harness|wikitext:103:document_level
  • helm:

    • helm|babi_qa
    • helm|bbq
    • helm|bbq:Age
    • helm|bbq:Disability_status
    • helm|bbq:Gender_identity
    • helm|bbq:Physical_appearance
    • helm|bbq:Race_ethnicity
    • helm|bbq:Race_x_SES
    • helm|bbq:Race_x_gender
    • helm|bbq:Religion
    • helm|bbq:SES
    • helm|bbq:Sexual_orientation
    • helm|bbq=Nationality
    • helm|bigbench:auto_debugging
    • helm|bigbench:bbq_lite_json:age_ambig
    • helm|bigbench:bbq_lite_json:age_disambig
    • helm|bigbench:bbq_lite_json:disability_status_ambig
    • helm|bigbench:bbq_lite_json:disability_status_disambig
    • helm|bigbench:bbq_lite_json:gender_identity_ambig
    • helm|bigbench:bbq_lite_json:gender_identity_disambig
    • helm|bigbench:bbq_lite_json:nationality_ambig
    • helm|bigbench:bbq_lite_json:nationality_disambig
    • helm|bigbench:bbq_lite_json:physical_appearance_ambig
    • helm|bigbench:bbq_lite_json:physical_appearance_disambig
    • helm|bigbench:bbq_lite_json:race_ethnicity_ambig
    • helm|bigbench:bbq_lite_json:race_ethnicity_disambig
    • helm|bigbench:bbq_lite_json:religion_ambig
    • helm|bigbench:bbq_lite_json:religion_disambig
    • helm|bigbench:bbq_lite_json:ses_ambig
    • helm|bigbench:bbq_lite_json:ses_disambig
    • helm|bigbench:bbq_lite_json:sexual_orientation_ambig
    • helm|bigbench:bbq_lite_json:sexual_orientation_disambig
    • helm|bigbench:code_line_description
    • helm|bigbench:conceptual_combinations:contradictions
    • helm|bigbench:conceptual_combinations:emergent_properties
    • helm|bigbench:conceptual_combinations:fanciful_fictional_combinations
    • helm|bigbench:conceptual_combinations:homonyms
    • helm|bigbench:conceptual_combinations:invented_words
    • helm|bigbench:conlang_translation:adna_from
    • helm|bigbench:conlang_translation:adna_to
    • helm|bigbench:conlang_translation:atikampe_from
    • helm|bigbench:conlang_translation:atikampe_to
    • helm|bigbench:conlang_translation:gornam_from
    • helm|bigbench:conlang_translation:gornam_to
    • helm|bigbench:conlang_translation:holuan_from
    • helm|bigbench:conlang_translation:holuan_to
    • helm|bigbench:conlang_translation:mkafala_from
    • helm|bigbench:conlang_translation:mkafala_to
    • helm|bigbench:conlang_translation:postpositive_english_from
    • helm|bigbench:conlang_translation:postpositive_english_to
    • helm|bigbench:conlang_translation:unapuri_from
    • helm|bigbench:conlang_translation:unapuri_to
    • helm|bigbench:conlang_translation:vaomi_from
    • helm|bigbench:conlang_translation:vaomi_to
    • helm|bigbench:emoji_movie
    • helm|bigbench:formal_fallacies_syllogisms_negation
    • helm|bigbench:hindu_knowledge
    • helm|bigbench:known_unknowns
    • helm|bigbench:language_identification
    • helm|bigbench:linguistics_puzzles
    • helm|bigbench:logic_grid_puzzle
    • helm|bigbench:logical_deduction-five_objects
    • helm|bigbench:logical_deduction-seven_objects
    • helm|bigbench:logical_deduction-three_objects
    • helm|bigbench:misconceptions_russian
    • helm|bigbench:novel_concepts
    • helm|bigbench:operators
    • helm|bigbench:parsinlu_reading_comprehension
    • helm|bigbench:play_dialog_same_or_different
    • helm|bigbench:repeat_copy_logic
    • helm|bigbench:strange_stories-boolean
    • helm|bigbench:strange_stories-multiple_choice
    • helm|bigbench:strategyqa
    • helm|bigbench:symbol_interpretation-adversarial
    • helm|bigbench:symbol_interpretation-emoji_agnostic
    • helm|bigbench:symbol_interpretation-name_agnostic
    • helm|bigbench:symbol_interpretation-plain
    • helm|bigbench:symbol_interpretation-tricky
    • helm|bigbench:vitaminc_fact_verification
    • helm|bigbench:winowhy
    • helm|blimp:adjunct_island
    • helm|blimp:anaphor_gender_agreement
    • helm|blimp:anaphor_number_agreement
    • helm|blimp:animate_subject_passive
    • helm|blimp:animate_subject_trans
    • helm|blimp:causative
    • helm|blimp:complex_NP_island
    • helm|blimp:coordinate_structure_constraint_complex_left_branch
    • helm|blimp:coordinate_structure_constraint_object_extraction
    • helm|blimp:determiner_noun_agreement_1
    • helm|blimp:determiner_noun_agreement_2
    • helm|blimp:determiner_noun_agreement_irregular_1
    • helm|blimp:determiner_noun_agreement_irregular_2
    • helm|blimp:determiner_noun_agreement_with_adj_2
    • helm|blimp:determiner_noun_agreement_with_adj_irregular_1
    • helm|blimp:determiner_noun_agreement_with_adj_irregular_2
    • helm|blimp:determiner_noun_agreement_with_adjective_1
    • helm|blimp:distractor_agreement_relational_noun
    • helm|blimp:distractor_agreement_relative_clause
    • helm|blimp:drop_argument
    • helm|blimp:ellipsis_n_bar_1
    • helm|blimp:ellipsis_n_bar_2
    • helm|blimp:existential_there_object_raising
    • helm|blimp:existential_there_quantifiers_1
    • helm|blimp:existential_there_quantifiers_2
    • helm|blimp:existential_there_subject_raising
    • helm|blimp:expletive_it_object_raising
    • helm|blimp:inchoative
    • helm|blimp:intransitive
    • helm|blimp:irregular_past_participle_adjectives
    • helm|blimp:irregular_past_participle_verbs
    • helm|blimp:irregular_plural_subject_verb_agreement_1
    • helm|blimp:irregular_plural_subject_verb_agreement_2
    • helm|blimp:left_branch_island_echo_question
    • helm|blimp:left_branch_island_simple_question
    • helm|blimp:matrix_question_npi_licensor_present
    • helm|blimp:npi_present_1
    • helm|blimp:npi_present_2
    • helm|blimp:only_npi_licensor_present
    • helm|blimp:only_npi_scope
    • helm|blimp:passive_1
    • helm|blimp:passive_2
    • helm|blimp:principle_A_c_command
    • helm|blimp:principle_A_case_1
    • helm|blimp:principle_A_case_2
    • helm|blimp:principle_A_domain_1
    • helm|blimp:principle_A_domain_2
    • helm|blimp:principle_A_domain_3
    • helm|blimp:principle_A_reconstruction
    • helm|blimp:regular_plural_subject_verb_agreement_1
    • helm|blimp:regular_plural_subject_verb_agreement_2
    • helm|blimp:sentential_negation_npi_licensor_present
    • helm|blimp:sentential_negation_npi_scope
    • helm|blimp:sentential_subject_island
    • helm|blimp:superlative_quantifiers_1
    • helm|blimp:superlative_quantifiers_2
    • helm|blimp:tough_vs_raising_1
    • helm|blimp:tough_vs_raising_2
    • helm|blimp:transitive
    • helm|blimp:wh_island
    • helm|blimp:wh_questions_object_gap
    • helm|blimp:wh_questions_subject_gap
    • helm|blimp:wh_questions_subject_gap_long_distance
    • helm|blimp:wh_vs_that_no_gap
    • helm|blimp:wh_vs_that_no_gap_long_distance
    • helm|blimp:wh_vs_that_with_gap
    • helm|blimp:wh_vs_that_with_gap_long_distance
    • helm|bold
    • helm|bold:gender
    • helm|bold:political_ideology
    • helm|bold:profession
    • helm|bold:race
    • helm|bold:religious_ideology
    • helm|boolq
    • helm|boolq:contrastset
    • helm|civil_comments
    • helm|civil_comments:LGBTQ
    • helm|civil_comments:black
    • helm|civil_comments:christian
    • helm|civil_comments:female
    • helm|civil_comments:male
    • helm|civil_comments:muslim
    • helm|civil_comments:other_religions
    • helm|civil_comments:white
    • helm|commonsenseqa
    • helm|copyright:n_books_1000-extractions_per_book_1-prefix_length_125
    • helm|copyright:n_books_1000-extractions_per_book_1-prefix_length_25
    • helm|copyright:n_books_1000-extractions_per_book_1-prefix_length_5
    • helm|copyright:n_books_1000-extractions_per_book_3-prefix_length_125
    • helm|copyright:n_books_1000-extractions_per_book_3-prefix_length_25
    • helm|copyright:n_books_1000-extractions_per_book_3-prefix_length_5
    • helm|copyright:oh_the_places
    • helm|copyright:pilot
    • helm|copyright:popular_books-prefix_length_10
    • helm|copyright:popular_books-prefix_length_125
    • helm|copyright:popular_books-prefix_length_25
    • helm|copyright:popular_books-prefix_length_250
    • helm|copyright:popular_books-prefix_length_5
    • helm|copyright:popular_books-prefix_length_50
    • helm|copyright:prompt_num_line_1-min_lines_20
    • helm|copyright:prompt_num_line_10-min_lines_20
    • helm|copyright:prompt_num_line_5-min_lines_20
    • helm|covid_dialogue
    • helm|dyck_language:2
    • helm|dyck_language:3
    • helm|dyck_language:4
    • helm|entity_data_imputation:Buy
    • helm|entity_data_imputation:Restaurant
    • helm|entity_matching:Abt_Buy
    • helm|entity_matching:Amazon_Google
    • helm|entity_matching:Beer
    • helm|entity_matching:Company
    • helm|entity_matching:DBLP_ACM
    • helm|entity_matching:DBLP_GoogleScholar
    • helm|entity_matching:Dirty_DBLP_ACM
    • helm|entity_matching:Dirty_DBLP_GoogleScholar
    • helm|entity_matching:Dirty_Walmart_Amazon
    • helm|entity_matching:Dirty_iTunes_Amazon
    • helm|entity_matching:Walmart_Amazon
    • helm|entity_matching:iTunes_Amazon
    • helm|entity_matching=Fodors_Zagats
    • helm|hellaswag
    • helm|imdb
    • helm|imdb:contrastset
    • helm|interactive_qa_mmlu:abstract_algebra
    • helm|interactive_qa_mmlu:college_chemistry
    • helm|interactive_qa_mmlu:global_facts
    • helm|interactive_qa_mmlu:miscellaneous
    • helm|interactive_qa_mmlu:nutrition
    • helm|interactive_qa_mmlu:us_foreign_policy
    • helm|legal_summarization:billsum
    • helm|legal_summarization:eurlexsum
    • helm|legal_summarization:multilexsum
    • helm|legalsupport
    • helm|lexglue:case_hold
    • helm|lexglue:ecthr_a
    • helm|lexglue:ecthr_b
    • helm|lexglue:eurlex
    • helm|lexglue:ledgar
    • helm|lexglue:scotus
    • helm|lexglue:unfair_tos
    • helm|lextreme:brazilian_court_decisions_judgment
    • helm|lextreme:brazilian_court_decisions_unanimity
    • helm|lextreme:covid19_emergency_event
    • helm|lextreme:german_argument_mining
    • helm|lextreme:greek_legal_code_chapter
    • helm|lextreme:greek_legal_code_subject
    • helm|lextreme:greek_legal_code_volume
    • helm|lextreme:greek_legal_ner
    • helm|lextreme:legalnero
    • helm|lextreme:lener_br
    • helm|lextreme:mapa_coarse
    • helm|lextreme:mapa_fine
    • helm|lextreme:multi_eurlex_level_1
    • helm|lextreme:multi_eurlex_level_2
    • helm|lextreme:multi_eurlex_level_3
    • helm|lextreme:online_terms_of_service_clause_topics
    • helm|lextreme:online_terms_of_service_unfairness_levels
    • helm|lextreme:swiss_judgment_prediction
    • helm|lsat_qa
    • helm|lsat_qa:assignment
    • helm|lsat_qa:grouping
    • helm|lsat_qa:miscellaneous
    • helm|lsat_qa:ordering
    • helm|me_q_sum
    • helm|med_dialog:healthcaremagic
    • helm|med_dialog:icliniq
    • helm|med_mcqa
    • helm|med_paragraph_simplification
    • helm|med_qa
    • helm|mmlu
    • helm|mmlu:abstract_algebra
    • helm|mmlu:anatomy
    • helm|mmlu:astronomy
    • helm|mmlu:business_ethics
    • helm|mmlu:clinical_knowledge
    • helm|mmlu:college_biology
    • helm|mmlu:college_chemistry
    • helm|mmlu:college_computer_science
    • helm|mmlu:college_mathematics
    • helm|mmlu:college_medicine
    • helm|mmlu:college_physics
    • helm|mmlu:computer_security
    • helm|mmlu:conceptual_physics
    • helm|mmlu:econometrics
    • helm|mmlu:electrical_engineering
    • helm|mmlu:elementary_mathematics
    • helm|mmlu:formal_logic
    • helm|mmlu:global_facts
    • helm|mmlu:high_school_biology
    • helm|mmlu:high_school_chemistry
    • helm|mmlu:high_school_computer_science
    • helm|mmlu:high_school_european_history
    • helm|mmlu:high_school_geography
    • helm|mmlu:high_school_government_and_politics
    • helm|mmlu:high_school_macroeconomics
    • helm|mmlu:high_school_mathematics
    • helm|mmlu:high_school_microeconomics
    • helm|mmlu:high_school_physics
    • helm|mmlu:high_school_psychology
    • helm|mmlu:high_school_statistics
    • helm|mmlu:high_school_us_history
    • helm|mmlu:high_school_world_history
    • helm|mmlu:human_aging
    • helm|mmlu:human_sexuality
    • helm|mmlu:international_law
    • helm|mmlu:jurisprudence
    • helm|mmlu:logical_fallacies
    • helm|mmlu:machine_learning
    • helm|mmlu:management
    • helm|mmlu:marketing
    • helm|mmlu:medical_genetics
    • helm|mmlu:miscellaneous
    • helm|mmlu:moral_disputes
    • helm|mmlu:moral_scenarios
    • helm|mmlu:nutrition
    • helm|mmlu:philosophy
    • helm|mmlu:prehistory
    • helm|mmlu:professional_accounting
    • helm|mmlu:professional_law
    • helm|mmlu:professional_medicine
    • helm|mmlu:professional_psychology
    • helm|mmlu:public_relations
    • helm|mmlu:security_studies
    • helm|mmlu:sociology
    • helm|mmlu:us_foreign_policy
    • helm|mmlu:virology
    • helm|mmlu:world_religions
    • helm|narrativeqa
    • helm|numeracy:linear_example
    • helm|numeracy:linear_standard
    • helm|numeracy:parabola_example
    • helm|numeracy:parabola_standard
    • helm|numeracy:paraboloid_example
    • helm|numeracy:paraboloid_standard
    • helm|numeracy:plane_example
    • helm|numeracy:plane_standard
    • helm|openbookqa
    • helm|piqa
    • helm|pubmedqa
    • helm|quac
    • helm|raft:ade_corpus_v2
    • helm|raft:banking_77
    • helm|raft:neurips_impact_statement_risks
    • helm|raft:one_stop_english
    • helm|raft:overruling
    • helm|raft:semiconductor_org_types
    • helm|raft:systematic_review_inclusion
    • helm|raft:tai_safety_research
    • helm|raft:terms_of_service
    • helm|raft:tweet_eval_hate
    • helm|raft:twitter_complaints
    • helm|real_toxicity_prompts
    • helm|siqa
    • helm|summarization:cnn-dm
    • helm|summarization:xsum
    • helm|summarization:xsum-sampled
    • helm|synthetic_reasoning:induction
    • helm|synthetic_reasoning:natural_easy
    • helm|synthetic_reasoning:natural_hard
    • helm|synthetic_reasoning:pattern_match
    • helm|synthetic_reasoning:variable_substitution
    • helm|the_pile:arxiv
    • helm|the_pile:bibliotik
    • helm|the_pile:commoncrawl
    • helm|the_pile:dm-mathematics
    • helm|the_pile:enron
    • helm|the_pile:europarl
    • helm|the_pile:freelaw
    • helm|the_pile:github
    • helm|the_pile:gutenberg
    • helm|the_pile:hackernews
    • helm|the_pile:nih-exporter
    • helm|the_pile:opensubtitles
    • helm|the_pile:openwebtext2
    • helm|the_pile:pubmed-abstracts
    • helm|the_pile:pubmed-central
    • helm|the_pile:stackexchange
    • helm|the_pile:upsto
    • helm|the_pile:wikipedia
    • helm|the_pile:youtubesubtitles
    • helm|truthfulqa
    • helm|twitterAAE:aa
    • helm|twitterAAE:white
    • helm|wikifact:applies_to_jurisdiction
    • helm|wikifact:atomic_number
    • helm|wikifact:author
    • helm|wikifact:award_received
    • helm|wikifact:basic_form_of_government
    • helm|wikifact:capital
    • helm|wikifact:capital_of
    • helm|wikifact:central_bank
    • helm|wikifact:composer
    • helm|wikifact:continent
    • helm|wikifact:country
    • helm|wikifact:country_of_citizenship
    • helm|wikifact:country_of_origin
    • helm|wikifact:creator
    • helm|wikifact:currency
    • helm|wikifact:defendant
    • helm|wikifact:developer
    • helm|wikifact:diplomatic_relation
    • helm|wikifact:director
    • helm|wikifact:discoverer_or_inventor
    • helm|wikifact:drug_or_therapy_used_for_treatment
    • helm|wikifact:educated_at
    • helm|wikifact:electron_configuration
    • helm|wikifact:employer
    • helm|wikifact:field_of_work
    • helm|wikifact:file_extension
    • helm|wikifact:genetic_association
    • helm|wikifact:genre
    • helm|wikifact:has_part
    • helm|wikifact:head_of_government
    • helm|wikifact:head_of_state
    • helm|wikifact:headquarters_location
    • helm|wikifact:industry
    • helm|wikifact:influenced_by
    • helm|wikifact:instance_of
    • helm|wikifact:instrument
    • helm|wikifact:language_of_work_or_name
    • helm|wikifact:languages_spoken_written_or_signed
    • helm|wikifact:laws_applied
    • helm|wikifact:located_in_the_administrative_territorial_entity
    • helm|wikifact:location
    • helm|wikifact:location_of_discovery
    • helm|wikifact:location_of_formation
    • helm|wikifact:majority_opinion_by
    • helm|wikifact:manufacturer
    • helm|wikifact:measured_physical_quantity
    • helm|wikifact:medical_condition_treated
    • helm|wikifact:member_of
    • helm|wikifact:member_of_political_party
    • helm|wikifact:member_of_sports_team
    • helm|wikifact:movement
    • helm|wikifact:named_after
    • helm|wikifact:native_language
    • helm|wikifact:number_of_processor_cores
    • helm|wikifact:occupation
    • helm|wikifact:office_held_by_head_of_government
    • helm|wikifact:office_held_by_head_of_state
    • helm|wikifact:official_language
    • helm|wikifact:operating_system
    • helm|wikifact:original_language_of_film_or_TV_show
    • helm|wikifact:original_network
    • helm|wikifact:overrules
    • helm|wikifact:owned_by
    • helm|wikifact:part_of
    • helm|wikifact:participating_team
    • helm|wikifact:place_of_birth
    • helm|wikifact:place_of_death
    • helm|wikifact:plaintiff
    • helm|wikifact:position_held
    • helm|wikifact:position_played_on_team
    • helm|wikifact:programming_language
    • helm|wikifact:recommended_unit_of_measurement
    • helm|wikifact:record_label
    • helm|wikifact:religion
    • helm|wikifact:repealed_by
    • helm|wikifact:shares_border_with
    • helm|wikifact:solved_by
    • helm|wikifact:statement_describes
    • helm|wikifact:stock_exchange
    • helm|wikifact:subclass_of
    • helm|wikifact:subsidiary
    • helm|wikifact:symptoms_and_signs
    • helm|wikifact:therapeutic_area
    • helm|wikifact:time_of_discovery_or_invention
    • helm|wikifact:twinned_administrative_body
    • helm|wikifact:work_location
    • helm|wikitext:103:document_level
    • helm|wmt14:cs-en
    • helm|wmt14:de-en
    • helm|wmt14:fr-en
    • helm|wmt14:hi-en
    • helm|wmt14:ru-en
  • leaderboard:

    • leaderboard|arc:challenge
    • leaderboard|gsm8k
    • leaderboard|hellaswag
    • leaderboard|mmlu:abstract_algebra
    • leaderboard|mmlu:anatomy
    • leaderboard|mmlu:astronomy
    • leaderboard|mmlu:business_ethics
    • leaderboard|mmlu:clinical_knowledge
    • leaderboard|mmlu:college_biology
    • leaderboard|mmlu:college_chemistry
    • leaderboard|mmlu:college_computer_science
    • leaderboard|mmlu:college_mathematics
    • leaderboard|mmlu:college_medicine
    • leaderboard|mmlu:college_physics
    • leaderboard|mmlu:computer_security
    • leaderboard|mmlu:conceptual_physics
    • leaderboard|mmlu:econometrics
    • leaderboard|mmlu:electrical_engineering
    • leaderboard|mmlu:elementary_mathematics
    • leaderboard|mmlu:formal_logic
    • leaderboard|mmlu:global_facts
    • leaderboard|mmlu:high_school_biology
    • leaderboard|mmlu:high_school_chemistry
    • leaderboard|mmlu:high_school_computer_science
    • leaderboard|mmlu:high_school_european_history
    • leaderboard|mmlu:high_school_geography
    • leaderboard|mmlu:high_school_government_and_politics
    • leaderboard|mmlu:high_school_macroeconomics
    • leaderboard|mmlu:high_school_mathematics
    • leaderboard|mmlu:high_school_microeconomics
    • leaderboard|mmlu:high_school_physics
    • leaderboard|mmlu:high_school_psychology
    • leaderboard|mmlu:high_school_statistics
    • leaderboard|mmlu:high_school_us_history
    • leaderboard|mmlu:high_school_world_history
    • leaderboard|mmlu:human_aging
    • leaderboard|mmlu:human_sexuality
    • leaderboard|mmlu:international_law
    • leaderboard|mmlu:jurisprudence
    • leaderboard|mmlu:logical_fallacies
    • leaderboard|mmlu:machine_learning
    • leaderboard|mmlu:management
    • leaderboard|mmlu:marketing
    • leaderboard|mmlu:medical_genetics
    • leaderboard|mmlu:miscellaneous
    • leaderboard|mmlu:moral_disputes
    • leaderboard|mmlu:moral_scenarios
    • leaderboard|mmlu:nutrition
    • leaderboard|mmlu:philosophy
    • leaderboard|mmlu:prehistory
    • leaderboard|mmlu:professional_accounting
    • leaderboard|mmlu:professional_law
    • leaderboard|mmlu:professional_medicine
    • leaderboard|mmlu:professional_psychology
    • leaderboard|mmlu:public_relations
    • leaderboard|mmlu:security_studies
    • leaderboard|mmlu:sociology
    • leaderboard|mmlu:us_foreign_policy
    • leaderboard|mmlu:virology
    • leaderboard|mmlu:world_religions
    • leaderboard|truthfulqa:mc
    • leaderboard|winogrande
  • lighteval:

    • lighteval|agieval:aqua-rat
    • lighteval|agieval:gaokao-biology
    • lighteval|agieval:gaokao-chemistry
    • lighteval|agieval:gaokao-chinese
    • lighteval|agieval:gaokao-english
    • lighteval|agieval:gaokao-geography
    • lighteval|agieval:gaokao-history
    • lighteval|agieval:gaokao-mathqa
    • lighteval|agieval:gaokao-physics
    • lighteval|agieval:logiqa-en
    • lighteval|agieval:logiqa-zh
    • lighteval|agieval:lsat-ar
    • lighteval|agieval:lsat-lr
    • lighteval|agieval:lsat-rc
    • lighteval|agieval:sat-en
    • lighteval|agieval:sat-en-without-passage
    • lighteval|agieval:sat-math
    • lighteval|anli
    • lighteval|anli:r1
    • lighteval|anli:r2
    • lighteval|anli:r3
    • lighteval|arc:easy
    • lighteval|arithmetic:1dc
    • lighteval|arithmetic:2da
    • lighteval|arithmetic:2dm
    • lighteval|arithmetic:2ds
    • lighteval|arithmetic:3da
    • lighteval|arithmetic:3ds
    • lighteval|arithmetic:4da
    • lighteval|arithmetic:4ds
    • lighteval|arithmetic:5da
    • lighteval|arithmetic:5ds
    • lighteval|asdiv
    • lighteval|bigbench:causal_judgment
    • lighteval|bigbench:date_understanding
    • lighteval|bigbench:disambiguation_qa
    • lighteval|bigbench:geometric_shapes
    • lighteval|bigbench:logical_deduction_five_objects
    • lighteval|bigbench:logical_deduction_seven_objects
    • lighteval|bigbench:logical_deduction_three_objects
    • lighteval|bigbench:movie_recommendation
    • lighteval|bigbench:navigate
    • lighteval|bigbench:reasoning_about_colored_objects
    • lighteval|bigbench:ruin_names
    • lighteval|bigbench:salient_translation_error_detection
    • lighteval|bigbench:snarks
    • lighteval|bigbench:sports_understanding
    • lighteval|bigbench:temporal_sequences
    • lighteval|bigbench:tracking_shuffled_objects_five_objects
    • lighteval|bigbench:tracking_shuffled_objects_seven_objects
    • lighteval|bigbench:tracking_shuffled_objects_three_objects
    • lighteval|blimp:adjunct_island
    • lighteval|blimp:anaphor_gender_agreement
    • lighteval|blimp:anaphor_number_agreement
    • lighteval|blimp:animate_subject_passive
    • lighteval|blimp:animate_subject_trans
    • lighteval|blimp:causative
    • lighteval|blimp:complex_NP_island
    • lighteval|blimp:coordinate_structure_constraint_complex_left_branch
    • lighteval|blimp:coordinate_structure_constraint_object_extraction
    • lighteval|blimp:determiner_noun_agreement_1
    • lighteval|blimp:determiner_noun_agreement_2
    • lighteval|blimp:determiner_noun_agreement_irregular_1
    • lighteval|blimp:determiner_noun_agreement_irregular_2
    • lighteval|blimp:determiner_noun_agreement_with_adj_2
    • lighteval|blimp:determiner_noun_agreement_with_adj_irregular_1
    • lighteval|blimp:determiner_noun_agreement_with_adj_irregular_2
    • lighteval|blimp:determiner_noun_agreement_with_adjective_1
    • lighteval|blimp:distractor_agreement_relational_noun
    • lighteval|blimp:distractor_agreement_relative_clause
    • lighteval|blimp:drop_argument
    • lighteval|blimp:ellipsis_n_bar_1
    • lighteval|blimp:ellipsis_n_bar_2
    • lighteval|blimp:existential_there_object_raising
    • lighteval|blimp:existential_there_quantifiers_1
    • lighteval|blimp:existential_there_quantifiers_2
    • lighteval|blimp:existential_there_subject_raising
    • lighteval|blimp:expletive_it_object_raising
    • lighteval|blimp:inchoative
    • lighteval|blimp:intransitive
    • lighteval|blimp:irregular_past_participle_adjectives
    • lighteval|blimp:irregular_past_participle_verbs
    • lighteval|blimp:irregular_plural_subject_verb_agreement_1
    • lighteval|blimp:irregular_plural_subject_verb_agreement_2
    • lighteval|blimp:left_branch_island_echo_question
    • lighteval|blimp:left_branch_island_simple_question
    • lighteval|blimp:matrix_question_npi_licensor_present
    • lighteval|blimp:npi_present_1
    • lighteval|blimp:npi_present_2
    • lighteval|blimp:only_npi_licensor_present
    • lighteval|blimp:only_npi_scope
    • lighteval|blimp:passive_1
    • lighteval|blimp:passive_2
    • lighteval|blimp:principle_A_c_command
    • lighteval|blimp:principle_A_case_1
    • lighteval|blimp:principle_A_case_2
    • lighteval|blimp:principle_A_domain_1
    • lighteval|blimp:principle_A_domain_2
    • lighteval|blimp:principle_A_domain_3
    • lighteval|blimp:principle_A_reconstruction
    • lighteval|blimp:regular_plural_subject_verb_agreement_1
    • lighteval|blimp:regular_plural_subject_verb_agreement_2
    • lighteval|blimp:sentential_negation_npi_licensor_present
    • lighteval|blimp:sentential_negation_npi_scope
    • lighteval|blimp:sentential_subject_island
    • lighteval|blimp:superlative_quantifiers_1
    • lighteval|blimp:superlative_quantifiers_2
    • lighteval|blimp:tough_vs_raising_1
    • lighteval|blimp:tough_vs_raising_2
    • lighteval|blimp:transitive
    • lighteval|blimp:wh_island
    • lighteval|blimp:wh_questions_object_gap
    • lighteval|blimp:wh_questions_subject_gap
    • lighteval|blimp:wh_questions_subject_gap_long_distance
    • lighteval|blimp:wh_vs_that_no_gap
    • lighteval|blimp:wh_vs_that_no_gap_long_distance
    • lighteval|blimp:wh_vs_that_with_gap
    • lighteval|blimp:wh_vs_that_with_gap_long_distance
    • lighteval|coqa
    • lighteval|coqa_bb
    • lighteval|drop
    • lighteval|ethics:commonsense
    • lighteval|ethics:deontology
    • lighteval|ethics:justice
    • lighteval|ethics:utilitarianism
    • lighteval|ethics:virtue
    • lighteval|glue:cola
    • lighteval|glue:mnli
    • lighteval|glue:mnli_mismatched
    • lighteval|glue:mrpc
    • lighteval|glue:qnli
    • lighteval|glue:qqp
    • lighteval|glue:rte
    • lighteval|glue:sst2
    • lighteval|glue:stsb
    • lighteval|glue:wnli
    • lighteval|gpqa
    • lighteval|gsm8k
    • lighteval|headqa:en
    • lighteval|headqa:es
    • lighteval|iwslt17:ar-en
    • lighteval|iwslt17:de-en
    • lighteval|iwslt17:en-ar
    • lighteval|iwslt17:en-de
    • lighteval|iwslt17:en-fr
    • lighteval|iwslt17:en-ja
    • lighteval|iwslt17:en-ko
    • lighteval|iwslt17:en-zh
    • lighteval|iwslt17:fr-en
    • lighteval|iwslt17:ja-en
    • lighteval|iwslt17:ko-en
    • lighteval|iwslt17:zh-en
    • lighteval|lambada:openai
    • lighteval|lambada:openai:de
    • lighteval|lambada:openai:en
    • lighteval|lambada:openai:es
    • lighteval|lambada:openai:fr
    • lighteval|lambada:openai:it
    • lighteval|lambada:openai_cloze
    • lighteval|lambada:standard
    • lighteval|lambada:standard_cloze
    • lighteval|logiqa
    • lighteval|math:algebra
    • lighteval|math:counting_and_probability
    • lighteval|math:geometry
    • lighteval|math:intermediate_algebra
    • lighteval|math:number_theory
    • lighteval|math:prealgebra
    • lighteval|math:precalculus
    • lighteval|math_cot:algebra
    • lighteval|math_cot:counting_and_probability
    • lighteval|math_cot:geometry
    • lighteval|math_cot:intermediate_algebra
    • lighteval|math_cot:number_theory
    • lighteval|math_cot:prealgebra
    • lighteval|math_cot:precalculus
    • lighteval|mathqa
    • lighteval|mgsm:bn
    • lighteval|mgsm:de
    • lighteval|mgsm:en
    • lighteval|mgsm:es
    • lighteval|mgsm:fr
    • lighteval|mgsm:ja
    • lighteval|mgsm:ru
    • lighteval|mgsm:sw
    • lighteval|mgsm:te
    • lighteval|mgsm:th
    • lighteval|mgsm:zh
    • lighteval|mtnt2019:en-fr
    • lighteval|mtnt2019:en-ja
    • lighteval|mtnt2019:fr-en
    • lighteval|mtnt2019:ja-en
    • lighteval|mutual
    • lighteval|mutual_plus
    • lighteval|openbookqa
    • lighteval|piqa
    • lighteval|prost
    • lighteval|pubmedqa
    • lighteval|qa4mre:2011
    • lighteval|qa4mre:2012
    • lighteval|qa4mre:2013
    • lighteval|qasper
    • lighteval|qasper_ll
    • lighteval|race:high
    • lighteval|sciq
    • lighteval|storycloze:2016
    • lighteval|storycloze:2018
    • lighteval|super_glue:boolq
    • lighteval|super_glue:cb
    • lighteval|super_glue:copa
    • lighteval|super_glue:multirc
    • lighteval|super_glue:rte
    • lighteval|super_glue:wic
    • lighteval|super_glue:wsc
    • lighteval|swag
    • lighteval|the_pile:arxiv
    • lighteval|the_pile:bookcorpus2
    • lighteval|the_pile:books3
    • lighteval|the_pile:dm-mathematics
    • lighteval|the_pile:enron
    • lighteval|the_pile:europarl
    • lighteval|the_pile:freelaw
    • lighteval|the_pile:github
    • lighteval|the_pile:gutenberg
    • lighteval|the_pile:hackernews
    • lighteval|the_pile:nih-exporter
    • lighteval|the_pile:opensubtitles
    • lighteval|the_pile:openwebtext2
    • lighteval|the_pile:philpapers
    • lighteval|the_pile:pile-cc
    • lighteval|the_pile:pubmed-abstracts
    • lighteval|the_pile:pubmed-central
    • lighteval|the_pile:stackexchange
    • lighteval|the_pile:ubuntu-irc
    • lighteval|the_pile:uspto
    • lighteval|the_pile:wikipedia
    • lighteval|the_pile:youtubesubtitles
    • lighteval|toxigen
    • lighteval|triviaqa
    • lighteval|truthfulqa:gen
    • lighteval|unscramble:anagrams1
    • lighteval|unscramble:anagrams2
    • lighteval|unscramble:cycle_letters
    • lighteval|unscramble:random_insertion
    • lighteval|unscramble:reversed_words
    • lighteval|webqs
    • lighteval|wikitext:2
    • lighteval|wmt08:cs-en
    • lighteval|wmt08:de-en
    • lighteval|wmt08:en-cs
    • lighteval|wmt08:en-de
    • lighteval|wmt08:en-es
    • lighteval|wmt08:en-fr
    • lighteval|wmt08:en-hu
    • lighteval|wmt08:es-en
    • lighteval|wmt08:fr-en
    • lighteval|wmt08:hu-en
    • lighteval|wmt09:cs-en
    • lighteval|wmt09:de-en
    • lighteval|wmt09:en-cs
    • lighteval|wmt09:en-de
    • lighteval|wmt09:en-es
    • lighteval|wmt09:en-fr
    • lighteval|wmt09:en-hu
    • lighteval|wmt09:en-it
    • lighteval|wmt09:es-en
    • lighteval|wmt09:fr-en
    • lighteval|wmt09:hu-en
    • lighteval|wmt09:it-en
    • lighteval|wmt10:cs-en
    • lighteval|wmt10:de-en
    • lighteval|wmt10:en-cs
    • lighteval|wmt10:en-de
    • lighteval|wmt10:en-es
    • lighteval|wmt10:en-fr
    • lighteval|wmt10:es-en
    • lighteval|wmt10:fr-en
    • lighteval|wmt11:cs-en
    • lighteval|wmt11:de-en
    • lighteval|wmt11:en-cs
    • lighteval|wmt11:en-de
    • lighteval|wmt11:en-es
    • lighteval|wmt11:en-fr
    • lighteval|wmt11:es-en
    • lighteval|wmt11:fr-en
    • lighteval|wmt12:cs-en
    • lighteval|wmt12:de-en
    • lighteval|wmt12:en-cs
    • lighteval|wmt12:en-de
    • lighteval|wmt12:en-es
    • lighteval|wmt12:en-fr
    • lighteval|wmt12:es-en
    • lighteval|wmt12:fr-en
    • lighteval|wmt13:cs-en
    • lighteval|wmt13:de-en
    • lighteval|wmt13:en-cs
    • lighteval|wmt13:en-de
    • lighteval|wmt13:en-es
    • lighteval|wmt13:en-fr
    • lighteval|wmt13:en-ru
    • lighteval|wmt13:es-en
    • lighteval|wmt13:fr-en
    • lighteval|wmt13:ru-en
    • lighteval|wmt14:cs-en
    • lighteval|wmt14:de-en
    • lighteval|wmt14:en-cs
    • lighteval|wmt14:en-de
    • lighteval|wmt14:en-fr
    • lighteval|wmt14:en-hi
    • lighteval|wmt14:en-ru
    • lighteval|wmt14:fr-en
    • lighteval|wmt14:hi-en
    • lighteval|wmt14:ru-en
    • lighteval|wmt15:cs-en
    • lighteval|wmt15:de-en
    • lighteval|wmt15:en-cs
    • lighteval|wmt15:en-de
    • lighteval|wmt15:en-fi
    • lighteval|wmt15:en-fr
    • lighteval|wmt15:en-ru
    • lighteval|wmt15:fi-en
    • lighteval|wmt15:fr-en
    • lighteval|wmt15:ru-en
    • lighteval|wmt16:cs-en
    • lighteval|wmt16:de-en
    • lighteval|wmt16:en-cs
    • lighteval|wmt16:en-de
    • lighteval|wmt16:en-fi
    • lighteval|wmt16:en-ro
    • lighteval|wmt16:en-ru
    • lighteval|wmt16:en-tr
    • lighteval|wmt16:fi-en
    • lighteval|wmt16:ro-en
    • lighteval|wmt16:ru-en
    • lighteval|wmt16:tr-en
    • lighteval|wmt17:cs-en
    • lighteval|wmt17:de-en
    • lighteval|wmt17:en-cs
    • lighteval|wmt17:en-de
    • lighteval|wmt17:en-fi
    • lighteval|wmt17:en-lv
    • lighteval|wmt17:en-ru
    • lighteval|wmt17:en-tr
    • lighteval|wmt17:en-zh
    • lighteval|wmt17:fi-en
    • lighteval|wmt17:lv-en
    • lighteval|wmt17:ru-en
    • lighteval|wmt17:tr-en
    • lighteval|wmt17:zh-en
    • lighteval|wmt18:cs-en
    • lighteval|wmt18:de-en
    • lighteval|wmt18:en-cs
    • lighteval|wmt18:en-de
    • lighteval|wmt18:en-et
    • lighteval|wmt18:en-fi
    • lighteval|wmt18:en-ru
    • lighteval|wmt18:en-tr
    • lighteval|wmt18:en-zh
    • lighteval|wmt18:et-en
    • lighteval|wmt18:fi-en
    • lighteval|wmt18:ru-en
    • lighteval|wmt18:tr-en
    • lighteval|wmt18:zh-en
    • lighteval|wmt19:cs-de
    • lighteval|wmt19:de-cs
    • lighteval|wmt19:de-en
    • lighteval|wmt19:de-fr
    • lighteval|wmt19:en-cs
    • lighteval|wmt19:en-de
    • lighteval|wmt19:en-fi
    • lighteval|wmt19:en-gu
    • lighteval|wmt19:en-kk
    • lighteval|wmt19:en-lt
    • lighteval|wmt19:en-ru
    • lighteval|wmt19:en-zh
    • lighteval|wmt19:fi-en
    • lighteval|wmt19:fr-de
    • lighteval|wmt19:gu-en
    • lighteval|wmt19:kk-en
    • lighteval|wmt19:lt-en
    • lighteval|wmt19:ru-en
    • lighteval|wmt19:zh-en
    • lighteval|wmt20:cs-en
    • lighteval|wmt20:de-en
    • lighteval|wmt20:de-fr
    • lighteval|wmt20:en-cs
    • lighteval|wmt20:en-de
    • lighteval|wmt20:en-iu
    • lighteval|wmt20:en-ja
    • lighteval|wmt20:en-km
    • lighteval|wmt20:en-pl
    • lighteval|wmt20:en-ps
    • lighteval|wmt20:en-ru
    • lighteval|wmt20:en-ta
    • lighteval|wmt20:en-zh
    • lighteval|wmt20:fr-de
    • lighteval|wmt20:iu-en
    • lighteval|wmt20:ja-en
    • lighteval|wmt20:km-en
    • lighteval|wmt20:pl-en
    • lighteval|wmt20:ps-en
    • lighteval|wmt20:ru-en
    • lighteval|wmt20:ta-en
    • lighteval|wmt20:zh-en
    • lighteval|wsc273
    • lighteval|xcopa:en
    • lighteval|xcopa:et
    • lighteval|xcopa:ht
    • lighteval|xcopa:id
    • lighteval|xcopa:it
    • lighteval|xcopa:qu
    • lighteval|xcopa:sw
    • lighteval|xcopa:ta
    • lighteval|xcopa:th
    • lighteval|xcopa:tr
    • lighteval|xcopa:vi
    • lighteval|xcopa:zh
    • lighteval|xstory_cloze:ar
    • lighteval|xstory_cloze:en
    • lighteval|xstory_cloze:es
    • lighteval|xstory_cloze:eu
    • lighteval|xstory_cloze:hi
    • lighteval|xstory_cloze:id
    • lighteval|xstory_cloze:my
    • lighteval|xstory_cloze:ru
    • lighteval|xstory_cloze:sw
    • lighteval|xstory_cloze:te
    • lighteval|xstory_cloze:zh
    • lighteval|xwinograd:en
    • lighteval|xwinograd:fr
    • lighteval|xwinograd:jp
    • lighteval|xwinograd:pt
    • lighteval|xwinograd:ru
    • lighteval|xwinograd:zh
  • original:

    • original|arc:c:letters
    • original|arc:c:options
    • original|arc:c:simple
    • original|mmlu
    • original|mmlu:abstract_algebra
    • original|mmlu:anatomy
    • original|mmlu:astronomy
    • original|mmlu:business_ethics
    • original|mmlu:clinical_knowledge
    • original|mmlu:college_biology
    • original|mmlu:college_chemistry
    • original|mmlu:college_computer_science
    • original|mmlu:college_mathematics
    • original|mmlu:college_medicine
    • original|mmlu:college_physics
    • original|mmlu:computer_security
    • original|mmlu:conceptual_physics
    • original|mmlu:econometrics
    • original|mmlu:electrical_engineering
    • original|mmlu:elementary_mathematics
    • original|mmlu:formal_logic
    • original|mmlu:global_facts
    • original|mmlu:high_school_biology
    • original|mmlu:high_school_chemistry
    • original|mmlu:high_school_computer_science
    • original|mmlu:high_school_european_history
    • original|mmlu:high_school_geography
    • original|mmlu:high_school_government_and_politics
    • original|mmlu:high_school_macroeconomics
    • original|mmlu:high_school_mathematics
    • original|mmlu:high_school_microeconomics
    • original|mmlu:high_school_physics
    • original|mmlu:high_school_psychology
    • original|mmlu:high_school_statistics
    • original|mmlu:high_school_us_history
    • original|mmlu:high_school_world_history
    • original|mmlu:human_aging
    • original|mmlu:human_sexuality
    • original|mmlu:international_law
    • original|mmlu:jurisprudence
    • original|mmlu:logical_fallacies
    • original|mmlu:machine_learning
    • original|mmlu:management
    • original|mmlu:marketing
    • original|mmlu:medical_genetics
    • original|mmlu:miscellaneous
    • original|mmlu:moral_disputes
    • original|mmlu:moral_scenarios
    • original|mmlu:nutrition
    • original|mmlu:philosophy
    • original|mmlu:prehistory
    • original|mmlu:professional_accounting
    • original|mmlu:professional_law
    • original|mmlu:professional_medicine
    • original|mmlu:professional_psychology
    • original|mmlu:public_relations
    • original|mmlu:security_studies
    • original|mmlu:sociology
    • original|mmlu:us_foreign_policy
    • original|mmlu:virology
    • original|mmlu:world_religions
< > Update on GitHub