VARCO_Arena / guide_mds /input_jsonls_en.md
sonsus's picture
others
c2ba4d5

A newer version of the Streamlit SDK is available: 1.41.1

Upgrade

[EN] Upload guide (jsonl)

Basic Requirements

  • Upload one jsonl file per model (e.g., five files to compare five LLMs)
  • ⚠️ Important: All jsonl files must have the same number of rows
  • ⚠️ Important: The model_id field must be unique within and across all files

Required Fields

  • Per Model Fields

    • model_id: Unique identifier for the model (recommendation: keep it short)
    • generated: The LLM's response to the test instruction
  • Required only for Translation (translation_pair prompt need those. See streamlit_app_local/user_submit/mt/llama5.jsonl)

    • source_lang: input language (e.g. Korean, KR, kor, ...)
    • target_lang: output language (e.g. English, EN, ...)
  • Common Fields (Must be identical across all files)

    • instruction: The input prompt or test instruction given to the model
    • task: Category label used to group results (useful when using different evaluation prompts per task)

Example Format

# model1.jsonl
{"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"}
{"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"}

# model2.jsonl
{"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"}
{"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"}
...
..
.

Use Case Example If you want to compare different prompting strategies for the same model:

  • Use the same instruction across files (using unified test scenarios).
  • generated responses of each prompting strategy will vary across the files.
  • Use descriptive model_id values like "prompt1", "prompt2", etc.