Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.41.1
[EN] Upload guide (jsonl
)
Basic Requirements
- Upload one
jsonl
file per model (e.g., five files to compare five LLMs) - ⚠️ Important: All
jsonl
files must have the same number of rows - ⚠️ Important: The
model_id
field must be unique within and across all files
Required Fields
Per Model Fields
model_id
: Unique identifier for the model (recommendation: keep it short)generated
: The LLM's response to the test instruction
Required only for Translation (
translation_pair
prompt need those. Seestreamlit_app_local/user_submit/mt/llama5.jsonl
)source_lang
: input language (e.g. Korean, KR, kor, ...)target_lang
: output language (e.g. English, EN, ...)
Common Fields (Must be identical across all files)
instruction
: The input prompt or test instruction given to the modeltask
: Category label used to group results (useful when using different evaluation prompts per task)
Example Format
# model1.jsonl
{"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"}
{"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"}
# model2.jsonl
{"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"}
{"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"}
...
..
.
Use Case Example If you want to compare different prompting strategies for the same model:
- Use the same
instruction
across files (using unified test scenarios). generated
responses of each prompting strategy will vary across the files.- Use descriptive
model_id
values like "prompt1", "prompt2", etc.