Spaces:

NCSOFT
/

VARCO_Arena

Running

App Files Files Community

sonsus commited on about 1 month ago

Commit

c2ba4d5

•

1 Parent(s): 4674242

others

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +122 -0
README_kr.md +123 -0
__pycache__/analysis_utils.cpython-311.pyc +0 -0
__pycache__/app.cpython-311.pyc +0 -0
__pycache__/query_comp.cpython-311.pyc +0 -0
__pycache__/view_utils.cpython-311.pyc +0 -0
analysis_utils.py +338 -0
app.py +452 -0
eval_models_list.txt +3 -0
eval_prompt_list.txt +3 -0
guide_mds/input_jsonls_en.md +37 -0
guide_mds/input_jsonls_kr.md +34 -0
modules/__pycache__/nav.cpython-311.pyc +0 -0
modules/nav.py +57 -0
pages/__pycache__/see_results.cpython-311.pyc +0 -0
pages/brief_intro.py +40 -0
pages/quick_start_guide.py +14 -0
pages/see_results.py +365 -0
precommit.sh +5 -0
query_comp.py +137 -0
requirements.txt +14 -0
run.sh +4 -0
streamlit_app_local/.streamlit/config.toml +4 -0
streamlit_app_local/README.md +24 -0
streamlit_app_local/__pycache__/analysis_utils.cpython-311.pyc +0 -0
streamlit_app_local/__pycache__/analysis_utils.cpython-38.pyc +0 -0
streamlit_app_local/__pycache__/app.cpython-311.pyc +0 -0
streamlit_app_local/__pycache__/app.cpython-38.pyc +0 -0
streamlit_app_local/__pycache__/query_comp.cpython-311.pyc +0 -0
streamlit_app_local/__pycache__/view_utils.cpython-311.pyc +0 -0
streamlit_app_local/__pycache__/view_utils.cpython-38.pyc +0 -0
streamlit_app_local/analysis_utils.py +338 -0
streamlit_app_local/app.py +318 -0
streamlit_app_local/eval_models_list.txt +3 -0
streamlit_app_local/eval_prompt_list.txt +5 -0
streamlit_app_local/guide_mds +1 -0
streamlit_app_local/modules/__pycache__/nav.cpython-311.pyc +0 -0
streamlit_app_local/modules/__pycache__/nav.cpython-38.pyc +0 -0
streamlit_app_local/modules/nav.py +46 -0
streamlit_app_local/pages/brief_intro.py +40 -0
streamlit_app_local/pages/see_results.py +391 -0
streamlit_app_local/query_comp.py +135 -0
streamlit_app_local/requirements.txt +5 -0
streamlit_app_local/run.sh +3 -0
streamlit_app_local/user_submit/12-02_14:29:30/1.jsonl +6 -0
streamlit_app_local/user_submit/12-02_14:29:30/2.jsonl +6 -0
streamlit_app_local/user_submit/llm/240413_dora.jsonl +6 -0
streamlit_app_local/user_submit/llm/240413_dora_epoch_5.jsonl +6 -0
streamlit_app_local/user_submit/llm/240413_full.jsonl +6 -0
streamlit_app_local/user_submit/llm/240413_full_epoch_5.jsonl +6 -0

README.md CHANGED Viewed

@@ -12,3 +12,125 @@ short_description: VARCO Arena is a reference-free LLM benchmarking approach
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# Varco Arena
+Varco Arena conducts tournaments between models to be compared for each test set command, ranking models accurately at an affordable price. This is more accurate and cost-effective than rating win rates by comparing against reference outputs.
+For more information, the followings may help understanding how it works.
+* [Paper](https://huggingface.co/papers/2411.01281)
+* [Blog Post (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
+## Quickstart
+### Running Web Demo locally (streamlit, Recommended!)
+```bash
+git clone [THIS_REPO]
+# install requirements below. we recommend miniforge to manage environment
+cd streamlit_app_local
+bash run.sh
+```
+For more details, see `[THIS_REPO]/streamlit_app_local/README.md`
+### CLI use
+* located at
+  * `varco_arena/`
+* debug configurations for vscode at
+  * `varco_arena/.vscode`
+```bash
+## gpt-4o-mini as a judge
+python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -m tournament -e "gpt-4o-mini"
+## vllm-openai served LLM as a judge
+python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -e SOME_MODEL_NAME_SERVED -m tournament -u "http://url_to/your/vllm_openai_server:someport"
+# dbg lines
+## openai api judge dbg
+python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
+## other testing lines
+python main.py -i "rsc/inputs_for_dbg/[SOME_DIRECTORY]/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
+## dummy judge dbg (checking errors without api requests)
+python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e debug
+```
+## Requirements
+We tested this on `python = 3.11.9` env:  `requirements.txt`
+```
+openai>=1.17.0
+munch
+pandas
+numpy
+tqdm>=4.48.0
+plotly
+scikit-learn
+kaleido
+tiktoken>=0.7.0
+pyyaml
+transformers
+streamlit>=1.40.2
+openpyxl
+git+https://github.com/shobrook/openlimit.git#egg=openlimit # do not install this by pypi
+# Linux
+uvloop
+# Windows
+winloop
+```
+#### Argument
+- -i, --input : directory path which contains input jsonlines files (llm outputs)
+- -o, --output_dir : directory where results to be put
+- -e, --evaluation : judge model specification (e.g. "gpt-4o-2024-05-13", "gpt-4o-mini", \[vllm-served-model-name\])
+- -k, --openai_api_key : OpenAI API Key
+- -u, --openai_url: URL to openai_styled_llm_server (requested by openai sdk)
+#### advanced
+- -j, --n_jobs : n jobs to be put to `asyncio.semaphore(n=)`
+- -p, --evalprompt : [see the directory](./varco_arena/prompts/*.yaml)
+- -lr, --limit_requests : vLLM OpenAI server request limit (default: 7,680)
+- -lt, --limit_tokens : vLLM OpenAI server token limit (default: 15,728,640)
+#### Input Data Format
+[input jsonl guides](./streamlit_app_local/guide_mds/input_jsonls_en.md)
+## Contributing & Customizing
+#### Do this after git clone and installation
+```bash
+pip install pre-commit
+pre-commit install
+```
+#### before commit
+```bash
+bash precommit.sh # black formatter will reformat the codes
+```
+## FAQ
+* I want to apply my custom judge prompt to run Varco Arena
+  * [`./varco_arena/prompts/`](./varco_arena/prompts/__init__.py) defines the prompts with `yaml` file and the class objects for those. Edit those as your need.
+* I want tailored judge prompts for each line of the test set row (i.e. ~100th row - `prompt1`, 101st~ - `prompt2`)
+  * You could see `load_prompt` at the above link receives `promptname` + `task` as a parameters to load the prompt. The function is called at [`./varco_arena/manager.py:async_run`](./varco_arena/manager.py).
+* I want more fields for my llm outputs jsonl files for tailored use, i.e. want more fields beyond `instruction`, `source`, `generated`.
+  * It's going to get tricky but let me briefly guide you about this.
+    * You might have to edit `varco_arena/eval_utils.py`:`async_eval_w_prompt` (this part calls `PROMPT_OBJ.complete_prompt()`)
+    * And all the related codes will require revision.
+## Special Thanks to (contributors)
+- Minho Lee (@Dialogue Model Team, NCSOFT) [github](https://github.com/minolee/)
+  - query wrapper
+  - rag prompt
+- Jumin Oh (@Generation Model Team, NCSOFT)
+  - overall prototyping of the system in haste
+## Citation
+If you found our work helpful, consider citing our paper!
+```
+@misc{son2024varcoarenatournamentapproach,
+      title={Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models},
+      author={Seonil Son and Ju-Min Oh and Heegon Jin and Cheolhun Jang and Jeongbeom Jeong and Kuntae Kim},
+      year={2024},
+      eprint={2411.01281},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2411.01281},
+}
+```

README_kr.md ADDED Viewed

	@@ -0,0 +1,123 @@

+# Varco Arena
+바르코 아레나는 테스트셋 명령어별로 비교할 모델들의 토너먼트를 수행하여 정확하게 모델들의 순위를 매깁니다. 이것은 reference 아웃풋과 비교하여 승률을 매기는 방법보다 정확하며 조금 더 저렴합니다.
+더 자세한 내용에 대해서는 아래의 링크를 참조하시면 됩니다.
+* [논문](https://huggingface.co/papers/2411.01281)
+* [엔씨소프트 테크블로그 (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
+## Quickstart
+### 로컬에서 스트림릿 앱으로 시작하기 (추천!)
+```bash
+git clone [THIS_REPO]
+# install requirements below. we recommend miniforge to manage environment
+cd streamlit_app_local
+bash run.sh
+```
+더 자세한 내용은 `[THIS_REPO]/streamlit_app_local/README.md` 을 참조하세요!
+### CLI 사용
+* cli와 웹 앱은 서로 같은 코드를 활용하며, 아래의 디렉토리에 있습니다.
+  * `varco_arena/`
+* vscode 상에서 디버깅을 위한 프리셋 프롬프트별 테스트 명령어는 다음 파일에 적혀있습니다.
+  * `varco_arena/.vscode/launch.json`
+```bash
+## gpt-4o-mini as a judge
+python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -m tournament -e "gpt-4o-mini"
+## vllm-openai served LLM as a judge
+python main.py -i "./some/dirpath/to/jsonl/files" -o SOME_REL_PATH_TO_CREATE -e SOME_MODEL_NAME_SERVED -m tournament -u "http://url_to/your/vllm_openai_server:someport"
+# dbg lines
+## openai api judge dbg
+python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
+## other testing lines
+python main.py -i "rsc/inputs_for_dbg/[SOME_DIRECTORY]/" -o SOME_WANTED_TARGET_DIR -e gpt-4o-mini
+## dummy judge dbg (checking errors without api requests)
+python main.py -i "rsc/inputs_for_dbg/dbg_400_error_inputs/" -o SOME_WANTED_TARGET_DIR -e debug
+```
+## Requirements
+`python = 3.11.9` 상에서 테스트  함. `requirements.txt`
+```
+openai>=1.17.0
+munch
+pandas
+numpy
+tqdm>=4.48.0
+plotly
+scikit-learn
+kaleido
+tiktoken>=0.7.0
+pyyaml
+transformers
+streamlit>=1.40.2
+openpyxl
+git+https://github.com/shobrook/openlimit.git#egg=openlimit # do not install this by pypi
+# Linux인 경우
+uvloop
+# Windows인 경우
+winloop
+```
+#### Argument
+- -i, --input : 입력 파일 or 디렉토리 or 파일명에 대한 정규 표현식
+- -o, --output_dir : 출력 파일이 저장되는 디렉토리
+- -e, --evaluation : 평가 모델 (e.g. "gpt-4o-2024-05-13", "gpt-4o-mini", vllm에서 띄운 모델 명 등)
+- -m, --matching_method: 매치 방식 (기본값 "tournament", "league" (비추천) )
+- -k, --openai_api_key : OpenAI API Key
+- -u, --openai_url: 로컬 vLLM OpenAI 서버 사용 시 URL(ip주소+포트)
+#### advanced
+- -j, --n_jobs : asyncio.semaphore()에 전달될 인자. Arena가 진행되지 않는다면 기본값인 32 이하로 내려보자
+- -p, --evalprompt : [해당 디렉토리 참조](./varco_arena/prompts/*.yaml)
+- -lr, --limit_requests : vLLM OpenAI 서버 요청 제한 (default: 7,680)
+- -lt, --limit_tokens : vLLM OpenAI 서버 토큰 제한 (default: 15,728,640)
+#### Input Data Format
+[input jsonl 가이드 링크](./streamlit_app_local/guide_mds/input_jsonls_kr.md)
+## Contributing & Customizing
+#### git clone 및 dependency 설치 후에 할 일
+```bash
+pip install pre-commit
+pre-commit install
+```
+#### commit 하기 전에 할 일
+```bash
+bash precommit.sh # 이게 코드들을 다 리포맷해줄거임
+```
+문의: 손선일
+* 내가 만든 프롬프트를 사용하고 싶어요
+  * [`./varco_arena/prompts/`](./varco_arena_core/prompts/__init__.py) 에선 각종 프롬프트 클래스 및 `yaml` 파일 형태로 정의된 프롬프트를 로드합니다. 프리셋을 참조하여 작성하시면 됩니다.
+* 테스트셋 별로 다른 평가 프롬프트를 사용하고 싶어요 (e.g. 작업에 따라 다른 프롬프트를 사용하고 싶어요)
+  * 위 걸어드린 링크의 `load_prompt` 를 통해서 `promptname` + `task` 형태로 [`./varco_arena_core/manager.py:async_run`](./varco_arena_core/manager.py) 프롬프트가 로드되도록 해놓았습니다.
+* 제가 사용하고 싶은 입력파일에 `instruction`, `source`, `generated` 이외에 다른 필드를 추가해서 사용하고 싶어요.
+  * 조금 복잡해지는데 다음 부분을 고쳐주세요
+    * `varco_arena/eval_utils.py` 에서 `async_eval_w_prompt` 부분을 손봐야할 수 있습니다 (여기에서 PROMPT_OBJ.complete_prompt()을 호출함)
+    * 그 외 연관된 부분은 타고타고 고쳐주셔야...
+## Special Thanks to (contributors)
+- 이민호 (@대화모델팀, NCSOFT) [github](https://github.com/minolee/)
+  - query wrapper
+  - rag prompt
+- 오주민 (@생성모델팀, NCSOFT)
+  - overall prototyping of the system in haste
+## Citation
+저희 작업물이 도움이 되었다면 저희도 도움을 받아볼 수 있을까요?😉
+```
+@misc{son2024varcoarenatournamentapproach,
+      title={Varco Arena: A Tournament Approach to Reference-Free Benchmarking Large Language Models},
+      author={Seonil Son and Ju-Min Oh and Heegon Jin and Cheolhun Jang and Jeongbeom Jeong and Kuntae Kim},
+      year={2024},
+      eprint={2411.01281},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2411.01281},
+}
+```

__pycache__/analysis_utils.cpython-311.pyc ADDED Viewed

Binary file (17.7 kB). View file

__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (22.3 kB). View file

__pycache__/query_comp.cpython-311.pyc ADDED Viewed

Binary file (7.99 kB). View file

__pycache__/view_utils.cpython-311.pyc ADDED Viewed

Binary file (18.3 kB). View file

analysis_utils.py ADDED Viewed

	@@ -0,0 +1,338 @@

+from copy import deepcopy
+from functools import partial
+from typing import *
+import pandas as pd
+from fire import Fire
+"""
+This code assumes dealing with only one instruction
+"""
+# from varco_arena.tournament
+def log2_power_of_two(n):
+    # First, let's make sure n is indeed a power of 2
+    if n & (n - 1) != 0 or n == 0:
+        raise ValueError("n must be a positive power of 2")
+    exponent = 0
+    while n > 1:
+        n >>= 1  # Right shift is like dividing by 2, but faster
+        exponent += 1
+    return exponent
+def get_1st(df: pd.DataFrame, alpha2names: dict) -> Optional[str]:
+    finals = df[df["round"] == "final"]
+    if len(finals) == 1:
+        first = finals.iloc[0].winner_resolved
+    else:
+        first = (
+            None  # error case (no finals match or multiple finals (buggy result file))
+        )
+    return first
+def get_unique_participants(df: pd.DataFrame) -> list:
+    participants = pd.concat([df.model_a, df.model_b]).unique().tolist()
+    participants = [p for p in participants if p]  # remove None
+    participants = sorted(participants)  # make it sorted
+    return participants
+def _impute_byes(df):
+    max_depth = df.depth.max()
+    # init
+    imputed_parts = dict()
+    for depth in range(max_depth + 1):
+        imputed_parts[depth] = df[df.depth == depth].copy()
+    # reverse
+    for depth in range(max_depth, 0, -1):  # always we have 1 proper match for depth=0
+        null_v_null = {
+            "model_a": "",
+            "model_b": "",
+            "winner": "model_a",
+            "match_order_in_round": "-",
+            "depth": depth,
+        }
+        # fill some_model vs null byes
+        players = get_unique_participants(imputed_parts[depth])
+        proceeded = get_unique_participants(imputed_parts[depth - 1])
+        imputed = []
+        for p in proceeded:
+            if p not in players:
+                p_v_null = deepcopy(null_v_null)
+                p_v_null["model_a"] = p
+                imputed.append(p_v_null)
+        imputed_parts[depth] = pd.concat(
+            [
+                imputed_parts[depth],
+                pd.DataFrame(imputed),
+            ],
+            axis="index",
+        )
+        # fill null vs null
+        n_null_v_null = 2 ** (depth) - len(imputed_parts[depth])
+        if n_null_v_null > 0:
+            imputed = pd.DataFrame([null_v_null] * n_null_v_null)
+            imputed_parts[depth] = pd.concat(
+                [
+                    imputed_parts[depth],
+                    imputed,
+                ],
+                axis="index",
+            )
+    df_imputed = pd.concat(imputed_parts.values(), axis="index")
+    df_imputed = df_imputed.sort_values(by="depth").reset_index(drop=True)
+    return df_imputed
+def index_test_scenario(df) -> pd.DataFrame:
+    df["inst_src"] = "inst: " + df.instruction + "\n\nsrc: " + df.source
+    df["idx_inst_src"] = df.apply(
+        lambda row: f"{row.tournament_idx}:\n{row.inst_src}", axis=1
+    )
+    # later used for tournament bracket backtrackiung
+    if "depth" not in df.columns:
+        mappings = {
+            "final": 0,
+            "semi-final": 1,
+            "quarter-final": 2,
+        }
+        def _convert_round_to_depth(rnd: str, mappings=None) -> int:
+            if rnd is None:
+                depth = None
+            elif rnd in mappings.keys():
+                depth = mappings[rnd]
+            elif rnd.startswith("round-"):  # assume perfect power of two
+                num = int(rnd.replace("round-", "").strip())
+                depth = log2_power_of_two(num) - 1
+            return depth
+        conv = partial(_convert_round_to_depth, mappings=mappings)
+        df["depth"] = df["round"].apply(conv)
+    return df
+def init_tournament_dataframe(df, alpha2names: dict = None) -> pd.DataFrame:
+    df = df.sort_values(by="depth").reset_index(drop=True)
+    # make winner interpretable (A -> model_a, B -> model_b)
+    df.winner = df.winner.apply(lambda txt: f"model_{txt.lower()}")
+    # define alpha2names if not given (covers upto 168 participants)
+    if alpha2names is None:
+        alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
+            abcdefghijklmnopqrstuvwxyz\
+            ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩ\
+                ㉠㉡㉢㉣㉤㉥㉦㉧㉨㉩㉪㉫㉬㉭\
+                    ㉮㉯㉰㉱㉲㉳㉴㉵㉶㉷㉸㉹㉺㉻\
+                        ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ\
+                        ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ\
+                            αβγδεζηθικλμνξοπρστυφχψω"
+        model_full_names = get_unique_participants(df)
+        alpha2names = dict(zip(alphabets, model_full_names))
+        if len(alpha2names) < len(model_full_names):
+            raise ValueError(
+                f"Tournament viewer cannot visualize more than {len(alphabets)=} participants. ({len(model_full_names)=} is given)\n\nOther features will not be affected but the tournament visualizer."
+            )
+    names2alpha = dict(zip(alpha2names.values(), alpha2names.keys()))
+    df = _impute_byes(df)
+    # preserve readables for later
+    df = _make_readables(df, names2alpha)
+    if len(df[df["round"] == "final"]) != 1:
+        raise ValueError(f"final match need to be one and only.")
+    return df, alpha2names
+def _make_readables(df, names2alpha):
+    df["human_readable_model_a"] = df.model_a.copy()
+    df["human_readable_model_b"] = df.model_b.copy()
+    df.model_a = df.model_a.apply(
+        lambda modelname: names2alpha[modelname] if modelname else "x"
+    )
+    df.model_b = df.model_b.apply(
+        lambda modelname: names2alpha[modelname] if modelname else "x"
+    )
+    df["human_readable_idx"] = df.apply(
+        lambda row: f"{row.name}: {row.human_readable_model_a} ({row.model_a}) vs. {row.human_readable_model_b} ({row.model_b if row.model_b else 'x'})",
+        axis=1,
+    )
+    df["winner_resolved"] = df.apply(lambda row: row[row.winner], axis=1)
+    df["winner_nodes"] = df.apply(
+        lambda row: f"{row.winner_resolved}:{row.name}".ljust(4, " "), axis=1
+    )  # later for figure representation of winner as a "node"
+    return df
+# draw
+def draw(df: pd.DataFrame, alpha2names: dict = None) -> str:
+    def _draw_round(
+        df: pd.DataFrame,
+        depth: int = None,
+        winners_in_order: list = None,
+    ) -> Tuple:
+        df_now = df[df.depth == depth]
+        max_depth = df.depth.max()
+        width = 2 ** ((max_depth - depth) + 2)
+        connect_left = "─" * (width)
+        connect_left = connect_left[4:]
+        connect_right = " " * (width)
+        connect_right = "┐" + connect_right[1:]
+        if winners_in_order is None:
+            assert (
+                depth == 0
+            ), f"{winners_in_order=} is only allowed when drawing the top (=final match)"
+            winners_in_order = df_now.winner_nodes
+        round_drawing_parts = []
+        descending_round_winners = []
+        for node in winners_in_order:
+            round_drawing_parts.append("".join([node, connect_left, connect_right]))
+            # next round winners in sync with winner order
+            row_now = df_now.query(f"winner_nodes=='{node}'")
+            descending_round_winners.append(row_now.model_a.item())
+            descending_round_winners.append(row_now.model_b.item())
+        # find descending_round_winners within winner_nodes format (num:alpha)
+        if depth == max_depth:
+            pass  # keep the descending_round_winners intact
+        else:
+            df_descend = df[df.depth == depth + 1]
+            for i, winner_alpha in enumerate(descending_round_winners):
+                node_intr = df_descend.query(
+                    f"winner_resolved=='{winner_alpha}'"
+                ).winner_nodes.item()
+                descending_round_winners[i] = node_intr
+        round_drawing = "".join(round_drawing_parts)
+        descending_unit = " " * width
+        descending_unit = "│" + descending_unit[1:]
+        descending_lines_parts = [descending_unit] * len(df_now) * 2
+        descending_lines = "".join(descending_lines_parts)
+        return round_drawing, descending_lines, descending_round_winners
+    drawings = []
+    winners_in_order = None
+    max_depth = df.depth.max()
+    for depth in range(max_depth + 1):
+        max_depth = df.depth.max()
+        winner_drw, lines_desc, winners_in_order = _draw_round(
+            df,
+            depth=depth,
+            winners_in_order=winners_in_order,
+        )
+        drawings.append((winner_drw, lines_desc))
+    # prepare bracket top
+    champion_alphabet = drawings[0][0].split()[0].split(":")[0]
+    champion_readable = alpha2names[champion_alphabet]
+    bracket_top = [f"🥇winner: {champion_readable}", "│"]
+    # prepare mid
+    bracket_mid = "\n".join(["\n".join(tup) for tup in drawings])
+    # prepare bot
+    initial_participants = winners_in_order
+    bracket_bot = (" " * 3).join(initial_participants)
+    full_figure = "\n".join(bracket_top + [bracket_mid, bracket_bot])
+    return full_figure
+def number_breakdown_from_df(result_df: pd.DataFrame) -> str:
+    n_models = len(get_unique_participants(result_df))
+    size_testset = int(len(result_df) / (n_models - 1))
+    interpretation = f"total {len(result_df)} matches = (n_models-1) * size_testset = ({n_models}-1) * {size_testset}"
+    return interpretation, n_models, size_testset
+def make_legend_str(df, alpha2names) -> str:
+    first = get_1st(df, alpha2names)
+    alpha2names = {k: v.replace("🥇 ", "") for k, v in alpha2names.items()}
+    alpha_ordered = sorted(list(alpha2names.keys()))
+    # names_ordered = sorted(list(alpha2names.values()))
+    # name2alpha = {v: k for k, v in alpha2names.items()}
+    for k, v in alpha2names.items():
+        if v == alpha2names[first]:
+            alpha2names[k] = f"🥇 {v}"
+    res_str = f"\n\nlegend:"
+    # for name in names_ordered:
+    # alpha = name2alpha[name]
+    for alpha in alpha_ordered:
+        name_w_medal = alpha2names[alpha]
+        res_str += f"\n{alpha}\t{name_w_medal}"
+    return res_str
+def main(
+    jslname: str = "result.json",
+):
+    """
+    테스트 코드
+    """
+    df = pd.read_json(jslname, orient="records")
+    df = df.drop(columns=["tstamp", "logs"])
+    df = index_test_scenario(df)
+    # 중간에 visualization(df) 여기선 생략. 만약 이거 뺴고 다 따라했는데 문제가 생긴다면 viz 문제다. 근데 안그럴거같긴함
+    selections = df.idx_inst_src.unique()
+    for i, sel in enumerate(selections):
+        try:
+            df_now = df[df.idx_inst_src == sel]
+            df_now_processed, _alpha2names = init_tournament_dataframe(
+                df_now, alpha2names=alpha2names if "alpha2names" in dir() else None
+            )
+            if "alpha2names" not in dir():
+                alpha2names = _alpha2names
+                assert "alpha2names" in dir()
+            bracket_drawing = draw(df_now_processed, alpha2names=alpha2names)
+            legend = make_legend_str(df_now_processed, alpha2names)
+            print(bracket_drawing + legend)
+            print(bracket_drawing + legend, file=open(f"{i}.txt", "w"))
+            print(f"\n\n{sel}", file=open(f"{i}.txt", "a"))
+            for match_idx_human in df_now_processed.human_readable_idx:
+                match_idx = int(match_idx_human.split(": ")[0])
+                row = df_now_processed.loc[match_idx]
+                winner = row.winner
+        except Exception as e:
+            print(e, file=open(f"{i}_err.txt", "w"))
+            print("", file=open(f"{i}_err.txt", "a"))
+            print(sel, file=open(f"{i}_err.txt", "a"))
+            df_now_processed[
+                [
+                    "depth",
+                    "round",
+                    "winner_nodes",
+                    "winner_resolved",
+                    "winner",
+                    "model_a",
+                    "model_b",
+                ]
+            ].to_json(f"{i}_err.jsonl", lines=True, orient="records")
+if __name__ == "__main__":
+    Fire(main)

app.py ADDED Viewed

	@@ -0,0 +1,452 @@

+# import shutil
+import os
+import select
+import subprocess
+import sys
+import time
+from datetime import datetime, timedelta, timezone
+from pathlib import Path
+from typing import *
+import streamlit as st
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+from varco_arena.varco_arena_core.prompts import load_prompt
+from view_utils import (
+    default_page_setting,
+    escape_markdown,
+    set_nav_bar,
+    show_linebreak_in_md,
+)
+VA_ROOT = Path(os.environ.get("VARCO_ARENA_RESULT_PATH", "./user_submit"))
+USR_SUB = VA_ROOT.parts[-1]
+import shutil
+import pandas as pd
+import analysis_utils as au
+from view_utils import visualization
+class DataCache:
+    def __init__(self):
+        self.cache = {}
+    def store(self, key: str, data: dict):
+        self.cache[key] = data
+    def get(self, key: str) -> Optional[dict]:
+        return self.cache.get(key)
+# Initialize the cache in session state if it doesn't exist
+if "data_cache" not in st.session_state:
+    st.session_state.data_cache = DataCache()
+def purge_user_sub_data(data_path_to_purge: Union[Path, str] = None):
+    if data_path_to_purge is None:
+        print("nothing to purge")
+        return
+    else:
+        shutil.rmtree(data_path_to_purge)
+        print(f"purged {str(data_path_to_purge)}")
+        return
+@st.cache_data
+def load_and_cache_data(result_file_path: Optional[str] = None) -> Tuple[Dict, Dict]:
+    """
+    Load data from file, cache it in memory, then remove the file.
+    Returns cached data on subsequent calls.
+    Args:
+        result_file_path: Path to the result JSON file
+    Returns:
+        Tuple of (all_result_dict, df_dict)
+    """
+    # Check if we already have cached data for this path
+    if result_file_path:
+        cache_key = str(Path(result_file_path))
+        cached_data = st.session_state.data_cache.get(cache_key)
+        if cached_data:
+            return cached_data["all_result_dict"], cached_data["df_dict"]
+    # Initialize empty dicts
+    all_result_dict = {}
+    df_dict = {}
+    if result_file_path is not None:
+        try:
+            result_file_path = Path(result_file_path)
+            # Read and process data
+            df = pd.read_json(result_file_path)
+            for col in ["tstamp", "logs"]:
+                if col in df.columns:
+                    df.drop(columns=[col], inplace=True)
+            df = au.index_test_scenario(df)
+            fig_dict_per_task = {}
+            df_dict_per_task = {}
+            # Process overall data
+            fig_dict_per_task["Overall"] = visualization(df, is_overall=True)
+            df_dict_per_task["Overall"] = df
+            # Process per-task data
+            for task in df["task"].unique():
+                df_task = df[df["task"] == task]
+                fig_dict_per_task[task] = visualization(df_task, is_overall=False)
+                df_dict_per_task[task] = df_task
+            # Create key from path components
+            prm_name = result_file_path.parts[-2]
+            exp_name = result_file_path.parts[-3]
+            key = f"{exp_name}/{prm_name}"
+            all_result_dict[key] = fig_dict_per_task
+            df_dict[key] = df_dict_per_task
+            # Store in cache before removing file
+            cache_data = {"all_result_dict": all_result_dict, "df_dict": df_dict}
+            st.session_state.data_cache.store(str(result_file_path), cache_data)
+            # Remove user experiment directory
+            purge_user_sub_data(data_path_to_purge=VA_ROOT)
+        except Exception as e:
+            st.error(f"Error processing data: {str(e)}")
+            return {}, {}
+    return all_result_dict, df_dict
+def upload_files(uploaded_files) -> Path:
+    # prep directory for user submission
+    user_sub_root = VA_ROOT
+    if user_sub_root.exists():
+        if not user_sub_root.is_dir():
+            raise ValueError(
+                f"{user_sub_root} file exists and is not a directory. Consider renaming it."
+            )
+    else:
+        user_sub_root.mkdir(parents=True)
+    KST = timezone(timedelta(hours=9))
+    tstamp = datetime.now(KST)
+    tstr = tstamp.strftime("%m-%d_%H:%M:%S")
+    files_dir_str = "./" + str(user_sub_root / tstr)
+    files_dir = Path(files_dir_str)
+    files_dir.mkdir(parents=True, exist_ok=True)
+    uploaded_files = list(uploaded_files)
+    if not uploaded_files:
+        st.warning("❌ No files to upload. Please drag/drop or browse files to upload.")
+        # purge_user_sub_data(data_path_to_purge=VA_ROOT)
+    elif len(uploaded_files) < 2:
+        st.error("❌ You need at least 2 jsonlines files to properly run VA.")
+        purge_user_sub_data(data_path_to_purge=VA_ROOT)
+    else:  # properly uploaded
+        for file in uploaded_files:
+            # Create a path for the file in the server directory
+            file_path = files_dir / file.name
+            # Save the file to the server directory
+            with open(file_path, "wb") as f:
+                f.write(file.getbuffer())
+        jslfiles = list(files_dir.glob("*.jsonl"))
+        st.success(f"✅ Successfully uploaded {len(jslfiles)} jsonl files.")
+        return files_dir.resolve()
+def run_varco_arena(
+    price_estimation: bool = False,
+    # upload_dir: Union[str, Path] = None,
+    promptname: str = None,
+    exp_name: str = None,
+    api_key: Optional[str] = None,
+    evaluation_model: str = "gpt-4o-mini",
+    update_interval: float = 1.0,
+):
+    # Use environment variable for API key
+    ptn = f"{str(st.session_state.upfiles_dir)}"
+    outdir = Path(ptn)
+    if exp_name:
+        outdir = outdir / exp_name
+    command = f"python varco_arena/main.py -i {ptn} -o {outdir} -k {api_key} -p {promptname} -e {evaluation_model} -j 64"
+    if price_estimation:
+        command = f"{command} -c"
+    else:
+        command = command.replace("python", "yes | python ")
+    print(command)
+    api_key = None  # clear immediately
+    process = subprocess.Popen(
+        command,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        stdin=subprocess.PIPE,
+        text=True,
+        bufsize=1,
+        shell=True,
+    )
+    # Set stdout and stdin to non-blocking mode
+    os.set_blocking(process.stdout.fileno(), False)
+    last_update_time = time.time()
+    terminal_output = st.empty()
+    full_output = f"{command}\n"
+    while True:
+        # Check if we have output to read
+        if select.select([process.stdout], [], [], 0)[0]:
+            output = process.stdout.readline()
+            if output:
+                full_output += output
+                if price_estimation:
+                    to_show = full_output
+                    terminal_output.code(to_show, language="bash")
+                else:
+                    current_time = time.time()
+                    if current_time - last_update_time > update_interval:
+                        lines = full_output.split("\n")
+                        if len(lines) < 5:
+                            to_show = full_output
+                        else:
+                            to_show = "\n".join(["...\n..\n.\n"] + lines[-5:])
+                        terminal_output.code(to_show, language="bash")
+                        last_update_time = current_time
+                print(output)
+            time.sleep(0.1)
+        # Check if the process has finished
+        if process.poll() is not None:
+            # Read any remaining output
+            remaining_output = process.stdout.read()
+            if remaining_output:
+                lines = remaining_output.split("\n")
+                if len(lines) > 10:
+                    to_show += "\n".join(["\n...\n..\n.\n"] + lines[-10:])
+                else:
+                    to_show += remaining_output
+                terminal_output.code(to_show, language="bash")
+                print(remaining_output)
+            break
+    return_code = process.poll()
+    return outdir, return_code
+def main():
+    # init lang
+    st.session_state["korean"] = st.session_state.get("korean", False)
+    sidebar_placeholder = default_page_setting()
+    set_nav_bar(
+        False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_init"
+    )
+    st.title("⚔️ VARCO ARENA ⚔️")
+    if st.session_state.korean:
+        st.write(
+            "**VARCO Arena는 각 모델의 생성된 결과를 비교 평가하여 모델의 성능 순위를 제공하는 시스템입니다. 커스텀 테스트셋에 유용하게 활용할 수 있으며, 모범답안을 필요로 하지 않습니다.**"
+        )
+    else:
+        st.write(
+            "**VARCO Arena is an LLM benchmarking system that compares model responses across customized test scenarios without requiring reference answers.**"
+        )
+    st.divider()
+    # Set up the file uploader
+    if st.session_state.korean:
+        st.markdown("### 1. 모델 출력파일 업로드")
+    else:
+        st.markdown("### 1. Upload LLM responses")
+    uploaded_files = st.file_uploader(
+        "Drag and Drop jsonlines files (.jsonl)", accept_multiple_files=True
+    )
+    if st.session_state.korean:
+        st.info("업로드 하신 파일은 자동으로 삭제되며 수집되거나 사용되지 않습니다.")
+    else:
+        st.info(
+            "Your uploads will be removed automatically, not being collected nor reused for any purpose."
+        )
+    # upload state
+    if "upfiles_dir" not in st.session_state:
+        st.session_state.upfiles_dir = None
+    if st.button("업로드하기" if st.session_state.korean else "Upload Files"):
+        st.session_state.upfiles_dir = upload_files(uploaded_files)
+    if st.button("업로드한 파일 지우기" if st.session_state.korean else "Purge my uploads"):
+        st.session_state.upfiles_dir = None
+        if VA_ROOT.is_dir():
+            shutil.rmtree(VA_ROOT)
+            st.success(
+                "✅ 업로드한 파일을 서버에서 지웠습니다"
+                if st.session_state.korean
+                else "✅ Removed your uploads from the server successfully"
+            )
+        else:
+            st.error(
+                "❌ 지울 파일이 없습니다"
+                if st.session_state.korean
+                else "❌ You have nothing uploaded"
+            )
+    if st.session_state.korean:
+        with st.expander("❓❔ 무엇을 업로드 하나요❓❔"):
+            st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
+    else:
+        with st.expander("❓❔  What should I upload ❓❔"):
+            st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
+    # Form for cost estimation
+    with st.form("cost_estimation_form"):
+        if st.session_state.korean:
+            st.write("### 2. 가격 산정")
+        else:
+            st.write("### 2. Cost Estimation")
+        eval_model = st.selectbox(
+            "Select Judge",
+            open("eval_models_list.txt", encoding="UTF8").read().split("\n"),
+        )
+        promptname = st.selectbox(
+            "Select Evalutaion Prompt",
+            open("eval_prompt_list.txt", encoding="UTF8").read().split("\n"),
+        )
+        if st.session_state.korean:
+            st.markdown("*`llmbar`외 다른 프롬프트는 충분히 검증된 프롬프트는 아닙니다. (동작은 함)")
+        else:
+            st.markdown(
+                "*Eval prompts other than `llmbar` is working example, not the optimal ones."
+            )
+        if promptname == USR_SUB:
+            raise ValueError(
+                f"{USR_SUB=} is preserved name for the system. Consider another naming for the prompt or consider changing {VA_ROOT=} (USR_SUB == VA_ROOT.parts[-1])."
+            )
+        estimate_button = st.form_submit_button("Calculate Cost!")
+        with st.expander(
+            "LLM Judge에 활용되는 프롬프트 (`Calculate Cost!` 클릭시 갱신)"
+            if st.session_state.korean
+            else "**Evaluation Prompt for LLM Judge (will refresh after `Calculate Cost!` clicked)**"
+        ):
+            prompt = load_prompt(promptname, task="-")
+            kwargs = dict(
+                inst="{inst}",
+                src="{src}",
+                out_a="{out_a}",
+                out_b="{out_b}",
+                task="-",
+            )
+            if promptname == "translation_pair":
+                kwargs["source_lang"] = "{source_lang}"
+                kwargs["target_lang"] = "{target_lang}"
+            prompt_cmpl = prompt.complete_prompt(**kwargs)
+            st.markdown(f"### Evaluation Prompt: {promptname}")
+            for msg in prompt_cmpl:
+                st.markdown(f"**{msg['role']}**")
+                st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
+        if estimate_button:
+            if st.session_state.get("upfiles_dir") is None:
+                st.error(
+                    "❌ Requirements: You have to upload jsonlines files first to proceed"
+                )
+            else:
+                st.markdown("##### Estimated Cost")
+                dummy_api_key = "dummy"
+                dummy_exp_name = "dummy"
+                result_file_path, return_code = run_varco_arena(
+                    # upload_dir=st.session_state.upfiles_dir,
+                    promptname=promptname,
+                    api_key=dummy_api_key,
+                    exp_name=dummy_exp_name,
+                    price_estimation=True,
+                    evaluation_model=eval_model,
+                )
+                if return_code:
+                    st.error(
+                        "❌ RuntimeError: An error occurred during cost estimation. **Restart from file upload!**"
+                    )
+                    purge_user_sub_data(data_path_to_purge=VA_ROOT)
+                else:
+                    st.success("✅ Cost estimation completed successfully")
+                    st.session_state.cost_estimated = True
+    # Form for actual run
+    with st.form("run_arena_form"):
+        if st.session_state.korean:
+            st.write("### 3. Varco Arena 구동하기")
+        else:
+            st.write("### 3. Run Varco Arena")
+        api_key = st.text_input("Enter your OpenAI API Key", type="password")
+        # demo exp name fixated
+        KST = timezone(timedelta(hours=9))
+        tstamp = datetime.now(KST)
+        tstr = tstamp.strftime("%m-%d_%H:%M:%S")
+        exp_name = f"{tstr}_KST_submit"
+        if st.session_state.korean:
+            st.write("**주의**:`Ctrl+C` 버튼은 구현되지 않았습니다. 구동 전 숙고해주세요.")
+        else:
+            st.write("**Caution: `Ctrl+C` button hasn't been implemented.**")
+        run_button = st.form_submit_button(
+            "🔥 Run Arena!",
+            disabled=(not st.session_state.get("cost_estimated", False))
+            or "result_file_path"
+            in st.session_state.keys(),  # run already performed once
+        )
+        if run_button:
+            set_nav_bar(
+                True,
+                sidebar_placeholder=sidebar_placeholder,
+                toggle_hashstr="app_during_run",
+            )
+            if st.session_state.get("upfiles_dir") is None:
+                st.error(
+                    "❌ Requirements: You have to upload jsonlines files first to proceed"
+                )
+            elif not api_key:
+                st.error("❌ Requirements: OpenAI key required to run VA.")
+            else:
+                result_file_path, return_code = run_varco_arena(
+                    # upload_dir=st.session_state.upfiles_dir,
+                    promptname=promptname,
+                    api_key=api_key,
+                    exp_name=exp_name,
+                    price_estimation=False,
+                    evaluation_model=eval_model,
+                )
+                if return_code:
+                    st.error(
+                        "❌ RuntimeError: An error occurred during Varco Arena run. Check the file and **restart from file upload!**"
+                    )
+                    purge_user_sub_data(data_path_to_purge=VA_ROOT)
+                else:
+                    st.success("✅ Varco Arena run completed successfully")
+                    st.session_state.result_file_path = list(
+                        result_file_path.glob("**/result.json")
+                    )[-1]
+    set_nav_bar(
+        False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_run_done"
+    )
+    if st.session_state.get("result_file_path", None) is not None:
+        print(f"{st.session_state.get('result_file_path', None)=}")
+        load_and_cache_data(result_file_path=str(st.session_state.result_file_path))
+if __name__ == "__main__":
+    main()

eval_models_list.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+gpt-4o-mini
+gpt-4o-2024-05-13
+gpt-4o-2024-08-06

eval_prompt_list.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+llmbar
+translation_pair
+rag_pair_kr

guide_mds/input_jsonls_en.md ADDED Viewed

	@@ -0,0 +1,37 @@

+#### \[EN\] Upload guide (`jsonl`)
+**Basic Requirements**
+  * Upload one `jsonl` file per model (e.g., five files to compare five LLMs)
+  * ⚠️ Important: All `jsonl` files must have the same number of rows
+  * ⚠️ Important: The `model_id` field must be unique within and across all files
+**Required Fields**
+* Per Model Fields
+  * `model_id`: Unique identifier for the model (recommendation: keep it short)
+  * `generated`: The LLM's response to the test instruction
+* Required only for Translation (`translation_pair` prompt need those. See `streamlit_app_local/user_submit/mt/llama5.jsonl`)
+  * `source_lang`: input language (e.g. Korean, KR, kor, ...)
+  * `target_lang`: output language (e.g. English, EN, ...)
+* Common Fields (Must be identical across all files)
+  * `instruction`: The input prompt or test instruction given to the model
+  * `task`: Category label used to group results (useful when using different evaluation prompts per task)
+**Example Format**
+```python
+# model1.jsonl
+{"model_id": "model1", "task": "directions", "instruction": "Where should I go?", "generated": "Over there"}
+{"model_id": "model1", "task": "arithmetic", "instruction": "1+1", "generated": "2"}
+# model2.jsonl
+{"model_id": "model2", "task": "directions", "instruction": "Where should I go?", "generated": "Head north"}
+{"model_id": "model2", "task": "arithmetic", "instruction": "1+1", "generated": "3"}
+...
+..
+.
+```
+**Use Case Example**
+If you want to compare different prompting strategies for the same model:
+* Use the same `instruction` across files (using unified test scenarios).
+* `generated` responses of each prompting strategy will vary across the files.
+* Use descriptive `model_id` values like "prompt1", "prompt2", etc.

guide_mds/input_jsonls_kr.md ADDED Viewed

	@@ -0,0 +1,34 @@

+#### \[KR\] 집어넣을 jsonl 파일 가이드
+비교할 모델이 다섯 개라면 다섯 개의 .jsonl 파일을 업로드하세요.
+* 💥모든 jsonl 은 같은 수의 행을 가져야합니다.
+* 💥`model_id` 필드는 파일마다 달라야하며 파일 내에서는 유일해야합니다.
+**jsonl 필수 필드**
+* 개별
+  * `model_id`: 평가받는 모델의 이름입니다. (짧게 쓰는 것 추천)
+  * `generated`: 모델이 testset instruction 에 생성한 응답을 넣으세요.
+* 번역평가 프롬프트 사용시 (`translation_pair`. `streamlit_app_local/user_submit/mt/llama5.jsonl` 에서 예시 볼 수 있음)
+  * `source_lang`: input language (e.g. Korean, KR, kor, ...)
+  * `target_lang`: output language (e.g. English, EN, ...)
+* 공통 부분 (**모든 파일에 대해 같아야 함**)
+  * `instruction`: 모델에 집어넣는 `testset instruction` 혹은 `input`에 해당하는 무언가입니다.
+  * `task`: 전체 결과를 subset으로 그룹지어서 보여줄 때 사용됩니다. `evaluation prompt`를 행별로 다르게 사용하고 싶을 때 활용될 수 있습니다.
+각 jsonl 파일은 아래처럼 생겼습니다.
+```python
+# model1.jsonl
+{"model_id": "모델1", "task": "길 묻기", "instruction": "어디로 가야하오", "generated": "저기로요"}
+{"model_id": "모델1", "task": "산수", "instruction": "1+1", "generated": "2"} # 길 묻기와 산수의 경우 다른 평가 프롬프트를 사용하고 싶을 수 있겠죠?
+# model2.jsonl -* model1.jsonl과 `instruction`은 같고 `generated`, `model_id` 는 다릅니다!
+{"model_id": "모델2", "task": "길 묻기", "instruction": "어디로 가야하오", "generated": "하이"}
+{"model_id": "모델2", "task": "산수", "instruction": "1+1", "generated": "3"}
+...
+..
+```
+예를 들어, 한가지 모델에 대해 다른 프롬프팅을 시도하여 다른 생성문을 얻었고 이를 비교하고 싶은 경우를 생각해봅시다. 이 때 평가받을 testset은 같으므로 `instruction`은 모두 같고 프롬프팅에 따라 `generated`는 달라지겠죠? `model_id` 는 `"prompt1"`, `"prompt2"` 등 취향에 맞게 적어주시면 됩니다.

modules/__pycache__/nav.cpython-311.pyc ADDED Viewed

Binary file (3.8 kB). View file

modules/nav.py ADDED Viewed

	@@ -0,0 +1,57 @@

+import streamlit as st
+def Navbar(sidebar_placeholder, toggle_hashstr: str = ""):
+    with sidebar_placeholder.container():
+        st.toggle(
+            "🇰🇷 한국어로 보기",
+            value=st.session_state.get("korean", False),
+            key=f"korean_toggle_{toggle_hashstr}",  # Add this explicit key
+            on_change=lambda: setattr(
+                st.session_state,
+                "korean",
+                st.session_state.get(f"korean_toggle_{toggle_hashstr}", False),
+            ),
+        )
+        st.page_link(
+            "app.py",
+            label="Varco Arena 구동" if st.session_state.korean else "Run VARCO Arena",
+            icon="🔥",
+        )
+        st.page_link(
+            "pages/see_results.py",
+            label="결과 보기" if st.session_state.korean else "See Results",
+            icon="📝",
+            disabled=st.session_state.get("is_running", False),
+        )
+        st.page_link(
+            "pages/brief_intro.py",
+            label="어떻게 작동하나요?" if st.session_state.korean else "How it Works",
+            icon="❔",
+            disabled=st.session_state.get("is_running", False),
+        )
+        st.page_link(
+            "pages/quick_start_guide.py",
+            label="본격 사용하기 가이드" if st.session_state.korean else "Quick Start Guide",
+            icon="💯",
+            disabled=st.session_state.get("is_running", False),
+        )
+        related_links_en = """
+**About**
+* [Paper](https://huggingface.co/papers/2411.01281)
+* [Blog (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
+* [Inquiry](https://linkedin.com/in/deftson/)
+""".strip()
+        related_links_kr = """
+**About**
+* [논문](https://huggingface.co/papers/2411.01281)
+* [블로그](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
+* [문의](https://linkedin.com/in/deftson/)
+""".strip()
+        st.info(related_links_kr if st.session_state.korean else related_links_en)
+        st.divider()
+        demo_warning_kr = "❕❗**본 데모에서는 새로고침 후에 평가 결과가 보존되지 않습니다**❕❗ 앱을 온전히 활용하시려면 개인 기기에서 호스팅하세요 (**💯본격 사용하기 가이드** 참조)"
+        demo_warning_en = "❕❗For this demo, **evaluation results will not be preserved after refreshing**❕❗ To fully utilize the app, please host it on your personal device (refer to **💯Quick Start Guide**)"
+        st.markdown(demo_warning_kr if st.session_state.korean else demo_warning_en)

pages/__pycache__/see_results.cpython-311.pyc ADDED Viewed

Binary file (26.3 kB). View file

pages/brief_intro.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import streamlit as st
+from view_utils import default_page_setting, set_nav_bar
+sidebar_placeholder = default_page_setting()
+set_nav_bar(
+    False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="brief_intro_init"
+)
+st.image("va_en.png")
+if st.session_state.korean:
+    st.info("여러분이 업로드하는 파일은 위 그림의 LLM 출력들입니다.")
+    with st.expander("❓❔ 무엇을 업로드 하나요❓❔"):
+        st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
+else:
+    st.info(
+        "What you upload is the generated responses of LLMs regarding the test prompts."
+    )
+    with st.expander("❓❔  What should I upload ❓❔"):
+        st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
+st.image("va_concept_new.png")
+st.markdown(
+    """
+| |Current Practice|Varco Arena|
+|-|-|-|
+|Total no. matches|$$n_{\\text{model}}*\\|X\\|$$|$$(n_{\\text{model}}-1)*\\|X\\|$$|
+|No. matches per LLM|$$\\|X\\|$$|$$\\left[\\|X\\|,\\|X\\|\\text{log}n_{\\text{model}}\\right]$$|
+|Comparison type|reference-mediated (anchored) |direct|
+            """
+)
+if st.session_state.korean:
+    st.info(
+        "Varco Arena는 신뢰성 있는 순위를 더 적은 횟수의 비교 내에 얻어내며, 이러한 특징은 LLM 직접 비교의 이점으로부터 기인합니다."
+    )
+else:
+    st.info(
+        "Varco Arena takes advantage of direct comparison between LLM responses to guarantee better reliability in fewer number of total matches."
+    )

pages/quick_start_guide.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import streamlit as st
+from view_utils import default_page_setting, set_nav_bar
+sidebar_placeholder = default_page_setting()
+set_nav_bar(
+    False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="quickstart_init"
+)
+if st.session_state.korean:
+    st.markdown(open("varco_arena/README_kr.md").read())
+else:
+    st.markdown(open("varco_arena/README_en.md").read())

pages/see_results.py ADDED Viewed

	@@ -0,0 +1,365 @@

+import pandas as pd
+import streamlit as st
+import analysis_utils as au
+from analysis_utils import number_breakdown_from_df
+from app import load_and_cache_data
+# from app import VA_ROOT
+from query_comp import QueryWrapper, get_base_url
+from varco_arena.varco_arena_core.prompts import load_prompt
+from view_utils import (
+    default_page_setting,
+    escape_markdown,
+    set_nav_bar,
+    show_linebreak_in_md,
+)
+DEFAULT_LAYOUT_DICT = {
+    "title": {"font": {"size": 20, "family": "Gothic A1"}},
+    "font": {"size": 16, "family": "Gothic A1"},
+    "xaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
+    "yaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
+    "legend": {"font": {"size": 12, "family": "Gothic A1"}},
+}
+def navigate(t, source, key, val):
+    # print(key, val)
+    if source is None:
+        return
+    target_index = t.index(source) + val
+    if 0 <= target_index < len(t):
+        st.session_state[key] = t[target_index]
+        st.rerun()
+def main():
+    sidebar_placeholder = default_page_setting(layout="wide")
+    set_nav_bar(
+        False,
+        sidebar_placeholder=sidebar_placeholder,
+        toggle_hashstr="see_results_init",
+    )
+    # load the data
+    # print(f"{st.session_state.get('result_file_path', None)=}")
+    most_recent_run = st.session_state.get("result_file_path", None)
+    most_recent_run = str(most_recent_run) if most_recent_run is not None else None
+    (
+        st.session_state["all_result_dict"],
+        st.session_state["df_dict"],
+    ) = load_and_cache_data(result_file_path=most_recent_run)
+    # side bar
+    st.sidebar.title("Select Result:")
+    result_select = QueryWrapper("expname")(
+        st.sidebar.selectbox,
+        list(st.session_state["all_result_dict"].keys()),
+    )
+    if result_select is None:
+        st.stop()
+    eval_prompt_name = result_select.split("/")[-1].strip()
+    if st.sidebar.button("Clear Cache"):
+        st.cache_data.clear()
+        st.cache_resource.clear()
+        st.rerun()
+    if result_select:
+        if "alpha2names" in st.session_state:
+            del st.session_state["alpha2names"]
+    fig_dict_per_task = st.session_state["all_result_dict"][result_select]
+    task_list = list(fig_dict_per_task.keys())
+    elo_rating_by_task = fig_dict_per_task["Overall"]["elo_rating_by_task"]
+    # tabs = st.tabs(task_list)
+    df_dict_per_task = st.session_state["df_dict"][result_select]
+    default_layout_dict = DEFAULT_LAYOUT_DICT
+    task = QueryWrapper("task", "Select Task")(st.selectbox, task_list)
+    if task is None:
+        st.stop()
+    figure_dict = fig_dict_per_task[task]
+    judgename = figure_dict["judgename"]
+    df = df_dict_per_task[task]
+    interpretation, n_models, size_testset = number_breakdown_from_df(df)
+    if st.session_state.korean:
+        st.markdown(f"## 결과 ({task})")
+        st.markdown(f"##### Judge 모델: {judgename} / 평가프롬: {eval_prompt_name}")
+        st.markdown(f"##### 테스트셋 사이즈: {int(size_testset)} 행")
+    else:
+        st.markdown(f"## Results ({task})")
+        st.markdown(f"##### Judge Model: {judgename} / prompt: {eval_prompt_name}")
+        st.markdown(f"##### Size of Testset: {int(size_testset)} rows")
+    col1, col2 = st.columns(2)
+    with col1:
+        with st.container(border=True):
+            st.markdown(f"#### Ratings ({task})")
+            st.table(figure_dict["elo_rating"])
+            st.write(show_linebreak_in_md(escape_markdown(interpretation)))
+    with col2:
+        with st.container(border=True):
+            st.plotly_chart(
+                elo_rating_by_task.update_layout(**default_layout_dict),
+                use_container_width=True,
+                key=f"{task}_elo_rating_by_task",
+            )
+    st.divider()
+    if st.session_state.korean:
+        st.markdown("### 토너먼트 (테스트 시나리오) 별로 보기")
+    else:
+        st.markdown("### Tournament Results by Test Scenario")
+    # with st.expander("볼 토너먼트 고르기"):
+    d = list(df.idx_inst_src.unique())
+    default_idx = st.session_state.get("selected_tournament", None)
+    cols = st.columns((1, 18, 1))
+    with cols[0]:
+        if st.button("◀", key="prev_tournament"):
+            navigate(d, default_idx, "selected_tournament", -1)
+    with cols[1]:
+        tournament_prm_select = QueryWrapper("tournament", "Select Tournament")(
+            st.selectbox,
+            d,
+            default_idx,
+            key=f"{task}_tournament_select",
+            on_change=lambda: st.session_state.update(
+                selected_tournament=st.session_state.get(f"{task}_tournament_select"),
+                selected_match=None,
+            ),
+            label_visibility="collapsed",
+        )
+    with cols[2]:
+        if st.button("▶", key="next_tournament"):
+            navigate(d, default_idx, "selected_tournament", 1)
+    # tournament_prm_select = st.selectbox(
+    #     "Select Tournament",
+    #     df.idx_inst_src.unique(),
+    #     index=d.index(st.session_state.get("selected_tournament")),
+    #     key=f"{task}_tournament_{result_select}",
+    # )
+    # print(tournament_prm_select, type(tournament_prm_select))
+    st.session_state["selected_tournament"] = tournament_prm_select
+    # tournament_prm_select = st.selectbox(
+    #     "Select Tournament",
+    #     df.idx_inst_src.unique(),
+    #     key=f"{task}_tournament_{result_select}",
+    # )
+    df_now_processed = None
+    if tournament_prm_select:
+        df_now = df[df.idx_inst_src == tournament_prm_select]
+        df_now_processed, _alpha2names = au.init_tournament_dataframe(
+            df_now,
+            alpha2names=st.session_state["alpha2names"]
+            if "alpha2names" in st.session_state.keys()
+            else None,
+        )
+        if "alpha2names" not in st.session_state:
+            st.session_state["alpha2names"] = _alpha2names
+        try:
+            bracket_drawing = au.draw(
+                df_now_processed,
+                alpha2names=st.session_state["alpha2names"],
+            )
+            legend = au.make_legend_str(
+                df_now_processed, st.session_state["alpha2names"]
+            )
+            st.code(bracket_drawing + legend)
+            m = list(df_now_processed.human_readable_idx)
+            default_idx = st.session_state.get("selected_match", None)
+            cols = st.columns((1, 18, 1))
+            with cols[0]:
+                if st.button("◀", key="prev_match"):
+                    navigate(m, default_idx, "selected_match", -1)
+            with cols[1]:
+                match_idx_human = QueryWrapper("match", "Select Match")(
+                    st.selectbox,
+                    m,
+                    default_idx,
+                    key=f"{task}_match_select",
+                    label_visibility="collapsed",
+                )
+            with cols[2]:
+                if st.button("▶", key="next_match"):
+                    navigate(m, default_idx, "selected_match", 1)
+            # match_idx_human = st.selectbox(
+            #     "Select Match",
+            #     df_now_processed.human_readable_idx,
+            #     key=f"{task}_match_{result_select}",
+            # )
+            # print(match_idx_human)
+            st.session_state["selected_match"] = match_idx_human
+            # match_idx_human = st.selectbox(
+            #     "Select Match",
+            #     df_now_processed.human_readable_idx,
+            #     key=f"{task}_match_{result_select}",
+            # )
+            if match_idx_human:
+                match_idx = int(match_idx_human.split(": ")[0])
+                row = df_now_processed.loc[match_idx]
+                st.markdown("#### Current Test Scenario:")
+                with st.expander(
+                    f"### Evaluation Prompt (evalprompt: {eval_prompt_name}--{task})"
+                ):
+                    prompt = load_prompt(eval_prompt_name, task=task)
+                    kwargs = dict(
+                        inst="{inst}",
+                        src="{src}",
+                        out_a="{out_a}",
+                        out_b="{out_b}",
+                        task=task,
+                    )
+                    if eval_prompt_name == "translation_pair":
+                        kwargs["source_lang"] = "{source_lang}"
+                        kwargs["target_lang"] = "{target_lang}"
+                    prompt_cmpl = prompt.complete_prompt(**kwargs)
+                    for msg in prompt_cmpl:
+                        st.markdown(f"**{msg['role']}**")
+                        st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
+                st.info(show_linebreak_in_md(tournament_prm_select))
+                winner = row.winner
+                col1, col2 = st.columns(2)
+                winnerbox = st.success
+                loserbox = st.error
+                with col1:
+                    iswinner = winner == "model_a"
+                    writemsg = winnerbox if iswinner else loserbox
+                    st.markdown(f"#### ({row.model_a}) {row.human_readable_model_a}")
+                    writemsg(
+                        show_linebreak_in_md(row.generated_a),
+                        icon="✅" if iswinner else "❌",
+                    )
+                with col2:
+                    iswinner = winner == "model_b"
+                    writemsg = winnerbox if iswinner else loserbox
+                    st.markdown(f"#### ({row.model_b}) {row.human_readable_model_b}")
+                    writemsg(
+                        show_linebreak_in_md(row.generated_b),
+                        icon="✅" if iswinner else "❌",
+                    )
+        except Exception as e:
+            import traceback
+            traceback.print_exc()
+            st.markdown(
+                "**Bug: 아래 표를 복사해서 이슈로 남겨주시면 개선에 도움이 됩니다. 감사합니다🙏**"
+                if st.session_state.korean
+                else "Bug: Please open issue and attach the table output below to help me out. Thanks in advance.🙏"
+            )
+            st.error(e)
+            st.info(tournament_prm_select)
+            st.table(
+                df_now_processed[
+                    [
+                        "depth",
+                        "round",
+                        "winner_nodes",
+                        "winner_resolved",
+                        "winner",
+                        "model_a",
+                        "model_b",
+                    ]
+                ]
+            )
+    st.write("Sharable link")
+    st.code(f"{get_base_url()}/see_results?{QueryWrapper.get_sharable_link()}")
+    st.divider()
+    if st.session_state.korean:
+        st.markdown("### 매치 통계")
+    else:
+        st.markdown("### Match Stats.")
+    col1, col2 = st.columns(2)
+    col1, col2 = st.columns(2)
+    with col1:
+        with st.container(border=True):
+            st.plotly_chart(
+                figure_dict[
+                    "fraction_of_model_a_wins_for_all_a_vs_b_matches"
+                ].update_layout(autosize=True, **default_layout_dict),
+                use_container_width=True,
+                key=f"{task}_fraction_of_model_a_wins_for_all_a_vs_b_matches",
+            )
+    with col2:
+        with st.container(border=True):
+            st.plotly_chart(
+                figure_dict["match_count_of_each_combination_of_models"].update_layout(
+                    autosize=True, **default_layout_dict
+                ),
+                use_container_width=True,
+                key=f"{task}_match_count_of_each_combination_of_models",
+            )
+    with col1:
+        with st.container(border=True):
+            st.plotly_chart(
+                figure_dict["match_count_for_each_model"].update_layout(
+                    **default_layout_dict
+                ),
+                use_container_width=True,
+                key=f"{task}_match_count_for_each_model",
+            )
+    with col2:
+        pass
+    if st.session_state.korean:
+        st.markdown("### 참고용 LLM Judge 편향 정보")
+    else:
+        st.markdown("### FYI: How biased is your LLM Judge?")
+    with st.expander("펼쳐서 보기" if st.session_state.korean else "Expand to show"):
+        st.info(
+            """
+Varco Arena에서는 position bias의 영향을 최소화하기 위해 모든 모델이 A나 B위치에 번갈아 위치하도록 하였습니다. 그러나 LLM Judge 혹은 Prompt의 성능이 부족하다고 느껴진다면, 아래 알려진 LLM Judge bias가 참고가 될겁니다.
+* position bias (왼쪽)
+* length bias (오른쪽)
+결과의 왜곡이 LLM Judge의 부족함 떄문이었다는 점을 규명하려면 사용하신 LLM Judge와 Prompt의 binary classification 정확도를 측정해보시길 바랍니다 (Varco Arena를 활용하여 이를 수행해볼 수 있습니다!).""".strip()
+            if st.session_state.korean
+            else """
+In Varco Arena, to minimize the effect of position bias, all models are alternately positioned in either position A or B. However, if you feel the LLM Judge or Prompt performance is insufficient, the following known LLM Judge biases may be helpful to reference:
+* position bias (left)
+* length bias (right)
+To determine if result distortion was due to LLM Judge limitations, please measure the binary classification accuracy of your LLM Judge and Prompt (You could use Varco Arena for this purpose!).
+""".strip()
+        )
+        st.markdown(f"#### {judgename} + prompt = {eval_prompt_name}")
+        col1, col2 = st.columns(2)
+        with col1:
+            with st.container(border=True):
+                st.plotly_chart(
+                    figure_dict["counts_of_match_winners"].update_layout(
+                        **default_layout_dict
+                    ),
+                    use_container_width=True,
+                    key=f"{task}_counts_of_match_winners",
+                )
+        with col2:
+            with st.container(border=True):
+                st.plotly_chart(
+                    figure_dict["length_bias"].update_layout(**default_layout_dict),
+                    use_container_width=True,
+                    key=f"{task}_length_bias",
+                )
+                st.table(figure_dict["length_bias_df"].groupby("category").describe().T)
+if __name__ == "__main__":
+    main()

precommit.sh ADDED Viewed

	@@ -0,0 +1,5 @@

+# reformat all to prevent commit problem
+git add .
+pre-commit run --all-files
+# restore the files added so that one can actually add files to be commit
+git restore --staged .

query_comp.py ADDED Viewed

	@@ -0,0 +1,137 @@

+# query를 자동으로 읽고 쓰는 container를 정의
+from __future__ import annotations
+import re
+from typing import Callable, TypeVar
+import streamlit as st
+__all__ = ["QueryWrapper", "get_base_url"]
+T = TypeVar("T")
+import hashlib
+import urllib.parse
+def SHA1(msg: str) -> str:
+    return hashlib.sha1(msg.encode()).hexdigest()[:8]
+def get_base_url():
+    session = st.runtime.get_instance()._session_mgr.list_active_sessions()[0]
+    return urllib.parse.urlunparse(
+        [session.client.request.protocol, session.client.request.host, "", "", "", ""]
+    )
+class QueryWrapper:
+    queries: dict[str, _QueryWrapper] = {}  # 기록용
+    def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
+        self.__wrapper = QueryWrapper.queries[query] = _QueryWrapper(
+            query, label, use_hash
+        )
+    def __call__(self, *args, **kwargs):
+        return self.__wrapper(*args, **kwargs)
+    @classmethod
+    def get_sharable_link(cls):
+        # for k, v in cls.queries.items():
+        #     print(f"{k}: {v}")
+        return re.sub(
+            "&+", "&", "&".join([str(v) for k, v in cls.queries.items()])
+        ).strip("&")
+class _QueryWrapper:
+    ILLEGAL_CHARS = "&/=?"
+    def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
+        self.query = query
+        self.label = label or query
+        self.use_hash = use_hash
+        self.hash_table = {}
+        self.key = None
+    def __call__(
+        self,
+        base_container: Callable,
+        legal_list: list[T],
+        default: T | list[T] | None = None,
+        *,
+        key: str | None = None,
+        **kwargs,
+    ) -> T | list[T] | None:
+        val_from_query = st.query_params.get_all(self.query.lower())
+        # print(val_from_query)
+        legal = len(val_from_query) > 0
+        self.key = key or self.label
+        self.hash_table = {SHA1(str(v)): v for v in legal_list}
+        # filter out illegal values
+        if legal and legal_list:
+            val_from_query = [v for v in val_from_query if v in self.hash_table]
+        # print(self.label, val_from_query, legal)
+        if legal:
+            selected = [self.hash_table[v] for v in val_from_query]
+        elif default:
+            selected = default
+        elif self.label in st.session_state:
+            selected = st.session_state[self.label]
+            if legal_list:
+                if isinstance(selected, list):
+                    selected = [v for v in selected if v in legal_list]
+                elif selected not in legal_list:
+                    selected = []
+        else:
+            selected = []
+        if selected is None:
+            pass
+        elif len(selected) == 1 and base_container in [st.selectbox, st.radio]:
+            selected = selected[0]
+        # print(self.label, selected)
+        if base_container == st.checkbox:
+            selected = base_container(
+                self.label,
+                legal_list,
+                index=legal_list.index(selected) if selected in legal_list else None,
+                key=self.key,
+                **kwargs,
+            )
+        elif base_container == st.multiselect:
+            selected = base_container(
+                self.label, legal_list, default=selected, key=self.key, **kwargs
+            )
+        elif base_container == st.radio:
+            selected = base_container(
+                self.label,
+                legal_list,
+                index=legal_list.index(selected) if selected in legal_list else None,
+                key=self.key,
+                **kwargs,
+            )
+        elif base_container == st.selectbox:
+            selected = base_container(
+                self.label,
+                legal_list,
+                index=legal_list.index(selected) if selected in legal_list else None,
+                key=self.key,
+                **kwargs,
+            )
+        else:
+            selected = base_container(self.label, legal_list, key=self.key, **kwargs)
+        return st.session_state[self.key]
+    def __str__(self):
+        selected = st.session_state.get(self.key, None)
+        if isinstance(selected, str):
+            return f"{self.query.lower()}={SHA1(selected)}"
+        elif isinstance(selected, list):
+            return "&".join([f"{self.query.lower()}={SHA1(str(v))}" for v in selected])
+        else:
+            return ""

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+openai>=1.17.0
+munch
+pandas
+numpy
+tqdm>=4.48.0
+plotly
+scikit-learn
+kaleido
+tiktoken>=0.7.0
+pyyaml
+transformers
+streamlit>=1.40.2
+openpyxl
+git+https://github.com/shobrook/openlimit.git#egg=openlimit

run.sh ADDED Viewed

	@@ -0,0 +1,4 @@

+#!/bin/bash
+# 여기서 돌려야해요 (cwd = ./streamlit_app_components/)
+# VARCO_ARENA_RESULT_PATH=./user_submit streamlit run app.py
+streamlit run app.py

streamlit_app_local/.streamlit/config.toml ADDED Viewed

	@@ -0,0 +1,4 @@

+[client]
+showSidebarNavigation = false
+[server]
+headless = true

streamlit_app_local/README.md ADDED Viewed

	@@ -0,0 +1,24 @@

+# Varco Arena web app
+```bash
+cd ./streamlit_app_local/
+bash run.sh
+```
+## view: Running
+![run_page](streamlit_app_local/page_run.png)
+## view: Results
+![result_page1](./page_result_1.png)
+![result_page2](./page_result_2.png)
+![result_page3](./page_result_3.png)
+![result_page4](./page_result_4.png)
+## how to use
+1. upload llm outputs in jsonlines format
+2. cost estimation
+3. run
+4. go to see results tab
+## example run results at...
+- `streamlit_app_local/user_submit/`
+- visible on web if you `bash run.sh`

streamlit_app_local/__pycache__/analysis_utils.cpython-311.pyc ADDED Viewed

Binary file (17.7 kB). View file

streamlit_app_local/__pycache__/analysis_utils.cpython-38.pyc ADDED Viewed

Binary file (9.12 kB). View file

streamlit_app_local/__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (15.9 kB). View file

streamlit_app_local/__pycache__/app.cpython-38.pyc ADDED Viewed

Binary file (6.32 kB). View file

streamlit_app_local/__pycache__/query_comp.cpython-311.pyc ADDED Viewed

Binary file (8 kB). View file

streamlit_app_local/__pycache__/view_utils.cpython-311.pyc ADDED Viewed

Binary file (18.3 kB). View file

streamlit_app_local/__pycache__/view_utils.cpython-38.pyc ADDED Viewed

Binary file (9.91 kB). View file

streamlit_app_local/analysis_utils.py ADDED Viewed

	@@ -0,0 +1,338 @@

+from copy import deepcopy
+from functools import partial
+from typing import *
+import pandas as pd
+from fire import Fire
+"""
+This code assumes dealing with only one instruction
+"""
+# from varco_arena.tournament
+def log2_power_of_two(n):
+    # First, let's make sure n is indeed a power of 2
+    if n & (n - 1) != 0 or n == 0:
+        raise ValueError("n must be a positive power of 2")
+    exponent = 0
+    while n > 1:
+        n >>= 1  # Right shift is like dividing by 2, but faster
+        exponent += 1
+    return exponent
+def get_1st(df: pd.DataFrame, alpha2names: dict) -> Optional[str]:
+    finals = df[df["round"] == "final"]
+    if len(finals) == 1:
+        first = finals.iloc[0].winner_resolved
+    else:
+        first = (
+            None  # error case (no finals match or multiple finals (buggy result file))
+        )
+    return first
+def get_unique_participants(df: pd.DataFrame) -> list:
+    participants = pd.concat([df.model_a, df.model_b]).unique().tolist()
+    participants = [p for p in participants if p]  # remove None
+    participants = sorted(participants)  # make it sorted
+    return participants
+def _impute_byes(df):
+    max_depth = df.depth.max()
+    # init
+    imputed_parts = dict()
+    for depth in range(max_depth + 1):
+        imputed_parts[depth] = df[df.depth == depth].copy()
+    # reverse
+    for depth in range(max_depth, 0, -1):  # always we have 1 proper match for depth=0
+        null_v_null = {
+            "model_a": "",
+            "model_b": "",
+            "winner": "model_a",
+            "match_order_in_round": "-",
+            "depth": depth,
+        }
+        # fill some_model vs null byes
+        players = get_unique_participants(imputed_parts[depth])
+        proceeded = get_unique_participants(imputed_parts[depth - 1])
+        imputed = []
+        for p in proceeded:
+            if p not in players:
+                p_v_null = deepcopy(null_v_null)
+                p_v_null["model_a"] = p
+                imputed.append(p_v_null)
+        imputed_parts[depth] = pd.concat(
+            [
+                imputed_parts[depth],
+                pd.DataFrame(imputed),
+            ],
+            axis="index",
+        )
+        # fill null vs null
+        n_null_v_null = 2 ** (depth) - len(imputed_parts[depth])
+        if n_null_v_null > 0:
+            imputed = pd.DataFrame([null_v_null] * n_null_v_null)
+            imputed_parts[depth] = pd.concat(
+                [
+                    imputed_parts[depth],
+                    imputed,
+                ],
+                axis="index",
+            )
+    df_imputed = pd.concat(imputed_parts.values(), axis="index")
+    df_imputed = df_imputed.sort_values(by="depth").reset_index(drop=True)
+    return df_imputed
+def index_test_scenario(df) -> pd.DataFrame:
+    df["inst_src"] = "inst: " + df.instruction + "\n\nsrc: " + df.source
+    df["idx_inst_src"] = df.apply(
+        lambda row: f"{row.tournament_idx}:\n{row.inst_src}", axis=1
+    )
+    # later used for tournament bracket backtrackiung
+    if "depth" not in df.columns:
+        mappings = {
+            "final": 0,
+            "semi-final": 1,
+            "quarter-final": 2,
+        }
+        def _convert_round_to_depth(rnd: str, mappings=None) -> int:
+            if rnd is None:
+                depth = None
+            elif rnd in mappings.keys():
+                depth = mappings[rnd]
+            elif rnd.startswith("round-"):  # assume perfect power of two
+                num = int(rnd.replace("round-", "").strip())
+                depth = log2_power_of_two(num) - 1
+            return depth
+        conv = partial(_convert_round_to_depth, mappings=mappings)
+        df["depth"] = df["round"].apply(conv)
+    return df
+def init_tournament_dataframe(df, alpha2names: dict = None) -> pd.DataFrame:
+    df = df.sort_values(by="depth").reset_index(drop=True)
+    # make winner interpretable (A -> model_a, B -> model_b)
+    df.winner = df.winner.apply(lambda txt: f"model_{txt.lower()}")
+    # define alpha2names if not given (covers upto 168 participants)
+    if alpha2names is None:
+        alphabets = "ABCDEFGHIJKLMNOPQRSTUVWXYZ\
+            abcdefghijklmnopqrstuvwxyz\
+            ⓐⓑⓒⓓⓔⓕⓖⓗⓘⓙⓚⓛⓜⓝⓞⓟⓠⓡⓢⓣⓤⓥⓦⓧⓨⓩ\
+                ㉠㉡㉢㉣㉤㉥㉦㉧㉨㉩㉪㉫㉬㉭\
+                    ㉮㉯㉰㉱㉲㉳㉴㉵㉶㉷㉸㉹㉺㉻\
+                        ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ\
+                        ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ\
+                            αβγδεζηθικλμνξοπρστυφχψω"
+        model_full_names = get_unique_participants(df)
+        alpha2names = dict(zip(alphabets, model_full_names))
+        if len(alpha2names) < len(model_full_names):
+            raise ValueError(
+                f"Tournament viewer cannot visualize more than {len(alphabets)=} participants. ({len(model_full_names)=} is given)\n\nOther features will not be affected but the tournament visualizer."
+            )
+    names2alpha = dict(zip(alpha2names.values(), alpha2names.keys()))
+    df = _impute_byes(df)
+    # preserve readables for later
+    df = _make_readables(df, names2alpha)
+    if len(df[df["round"] == "final"]) != 1:
+        raise ValueError(f"final match need to be one and only.")
+    return df, alpha2names
+def _make_readables(df, names2alpha):
+    df["human_readable_model_a"] = df.model_a.copy()
+    df["human_readable_model_b"] = df.model_b.copy()
+    df.model_a = df.model_a.apply(
+        lambda modelname: names2alpha[modelname] if modelname else "x"
+    )
+    df.model_b = df.model_b.apply(
+        lambda modelname: names2alpha[modelname] if modelname else "x"
+    )
+    df["human_readable_idx"] = df.apply(
+        lambda row: f"{row.name}: {row.human_readable_model_a} ({row.model_a}) vs. {row.human_readable_model_b} ({row.model_b if row.model_b else 'x'})",
+        axis=1,
+    )
+    df["winner_resolved"] = df.apply(lambda row: row[row.winner], axis=1)
+    df["winner_nodes"] = df.apply(
+        lambda row: f"{row.winner_resolved}:{row.name}".ljust(4, " "), axis=1
+    )  # later for figure representation of winner as a "node"
+    return df
+# draw
+def draw(df: pd.DataFrame, alpha2names: dict = None) -> str:
+    def _draw_round(
+        df: pd.DataFrame,
+        depth: int = None,
+        winners_in_order: list = None,
+    ) -> Tuple:
+        df_now = df[df.depth == depth]
+        max_depth = df.depth.max()
+        width = 2 ** ((max_depth - depth) + 2)
+        connect_left = "─" * (width)
+        connect_left = connect_left[4:]
+        connect_right = " " * (width)
+        connect_right = "┐" + connect_right[1:]
+        if winners_in_order is None:
+            assert (
+                depth == 0
+            ), f"{winners_in_order=} is only allowed when drawing the top (=final match)"
+            winners_in_order = df_now.winner_nodes
+        round_drawing_parts = []
+        descending_round_winners = []
+        for node in winners_in_order:
+            round_drawing_parts.append("".join([node, connect_left, connect_right]))
+            # next round winners in sync with winner order
+            row_now = df_now.query(f"winner_nodes=='{node}'")
+            descending_round_winners.append(row_now.model_a.item())
+            descending_round_winners.append(row_now.model_b.item())
+        # find descending_round_winners within winner_nodes format (num:alpha)
+        if depth == max_depth:
+            pass  # keep the descending_round_winners intact
+        else:
+            df_descend = df[df.depth == depth + 1]
+            for i, winner_alpha in enumerate(descending_round_winners):
+                node_intr = df_descend.query(
+                    f"winner_resolved=='{winner_alpha}'"
+                ).winner_nodes.item()
+                descending_round_winners[i] = node_intr
+        round_drawing = "".join(round_drawing_parts)
+        descending_unit = " " * width
+        descending_unit = "│" + descending_unit[1:]
+        descending_lines_parts = [descending_unit] * len(df_now) * 2
+        descending_lines = "".join(descending_lines_parts)
+        return round_drawing, descending_lines, descending_round_winners
+    drawings = []
+    winners_in_order = None
+    max_depth = df.depth.max()
+    for depth in range(max_depth + 1):
+        max_depth = df.depth.max()
+        winner_drw, lines_desc, winners_in_order = _draw_round(
+            df,
+            depth=depth,
+            winners_in_order=winners_in_order,
+        )
+        drawings.append((winner_drw, lines_desc))
+    # prepare bracket top
+    champion_alphabet = drawings[0][0].split()[0].split(":")[0]
+    champion_readable = alpha2names[champion_alphabet]
+    bracket_top = [f"🥇winner: {champion_readable}", "│"]
+    # prepare mid
+    bracket_mid = "\n".join(["\n".join(tup) for tup in drawings])
+    # prepare bot
+    initial_participants = winners_in_order
+    bracket_bot = (" " * 3).join(initial_participants)
+    full_figure = "\n".join(bracket_top + [bracket_mid, bracket_bot])
+    return full_figure
+def number_breakdown_from_df(result_df: pd.DataFrame) -> str:
+    n_models = len(get_unique_participants(result_df))
+    size_testset = int(len(result_df) / (n_models - 1))
+    interpretation = f"total {len(result_df)} matches = (n_models-1) * size_testset = ({n_models}-1) * {size_testset}"
+    return interpretation, n_models, size_testset
+def make_legend_str(df, alpha2names) -> str:
+    first = get_1st(df, alpha2names)
+    alpha2names = {k: v.replace("🥇 ", "") for k, v in alpha2names.items()}
+    alpha_ordered = sorted(list(alpha2names.keys()))
+    # names_ordered = sorted(list(alpha2names.values()))
+    # name2alpha = {v: k for k, v in alpha2names.items()}
+    for k, v in alpha2names.items():
+        if v == alpha2names[first]:
+            alpha2names[k] = f"🥇 {v}"
+    res_str = f"\n\nlegend:"
+    # for name in names_ordered:
+    # alpha = name2alpha[name]
+    for alpha in alpha_ordered:
+        name_w_medal = alpha2names[alpha]
+        res_str += f"\n{alpha}\t{name_w_medal}"
+    return res_str
+def main(
+    jslname: str = "result.json",
+):
+    """
+    테스트 코드
+    """
+    df = pd.read_json(jslname, orient="records")
+    df = df.drop(columns=["tstamp", "logs"])
+    df = index_test_scenario(df)
+    # 중간에 visualization(df) 여기선 생략. 만약 이거 뺴고 다 따라했는데 문제가 생긴다면 viz 문제다. 근데 안그럴거같긴함
+    selections = df.idx_inst_src.unique()
+    for i, sel in enumerate(selections):
+        try:
+            df_now = df[df.idx_inst_src == sel]
+            df_now_processed, _alpha2names = init_tournament_dataframe(
+                df_now, alpha2names=alpha2names if "alpha2names" in dir() else None
+            )
+            if "alpha2names" not in dir():
+                alpha2names = _alpha2names
+                assert "alpha2names" in dir()
+            bracket_drawing = draw(df_now_processed, alpha2names=alpha2names)
+            legend = make_legend_str(df_now_processed, alpha2names)
+            print(bracket_drawing + legend)
+            print(bracket_drawing + legend, file=open(f"{i}.txt", "w"))
+            print(f"\n\n{sel}", file=open(f"{i}.txt", "a"))
+            for match_idx_human in df_now_processed.human_readable_idx:
+                match_idx = int(match_idx_human.split(": ")[0])
+                row = df_now_processed.loc[match_idx]
+                winner = row.winner
+        except Exception as e:
+            print(e, file=open(f"{i}_err.txt", "w"))
+            print("", file=open(f"{i}_err.txt", "a"))
+            print(sel, file=open(f"{i}_err.txt", "a"))
+            df_now_processed[
+                [
+                    "depth",
+                    "round",
+                    "winner_nodes",
+                    "winner_resolved",
+                    "winner",
+                    "model_a",
+                    "model_b",
+                ]
+            ].to_json(f"{i}_err.jsonl", lines=True, orient="records")
+if __name__ == "__main__":
+    Fire(main)

streamlit_app_local/app.py ADDED Viewed

	@@ -0,0 +1,318 @@

+# import shutil
+import os
+import select
+import subprocess
+import sys
+import time
+from datetime import datetime, timedelta, timezone
+from pathlib import Path
+from typing import *
+import streamlit as st
+sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
+from varco_arena_core.prompts import load_prompt
+from view_utils import (
+    default_page_setting,
+    escape_markdown,
+    set_nav_bar,
+    show_linebreak_in_md,
+)
+# import sys
+# print(sys.executable)
+VA_ROOT = Path(os.environ.get("VARCO_ARENA_RESULT_PATH", "./user_submit"))
+USR_SUB = VA_ROOT.parts[-1]
+def upload_files(uploaded_files) -> Path:
+    # prep directory for user submission
+    user_sub_root = VA_ROOT
+    if user_sub_root.exists():
+        if not user_sub_root.is_dir():
+            raise ValueError(
+                f"{user_sub_root} file exists and is not a directory. Consider renaming it."
+            )
+    else:
+        user_sub_root.mkdir(parents=True)
+    KST = timezone(timedelta(hours=9))
+    tstamp = datetime.now(KST)
+    tstr = tstamp.strftime("%m-%d_%H:%M:%S")
+    files_dir_str = "./" + str(user_sub_root / tstr)
+    files_dir = Path(files_dir_str)
+    files_dir.mkdir(parents=True, exist_ok=True)
+    uploaded_files = list(uploaded_files)
+    if not uploaded_files:
+        st.warning("❌ No files to upload. Please drag/drop or browse files to upload.")
+    elif len(uploaded_files) < 2:
+        st.error("❌ You need at least 2 jsonlines files to properly run VA.")
+    else:  # properly uploaded
+        for file in uploaded_files:
+            # Create a path for the file in the server directory
+            file_path = files_dir / file.name
+            # Save the file to the server directory
+            with open(file_path, "wb") as f:
+                f.write(file.getbuffer())
+        jslfiles = list(files_dir.glob("*.jsonl"))
+        st.success(f"✅ Successfully uploaded {len(jslfiles)} jsonl files.")
+        return files_dir.resolve()
+def run_varco_arena(
+    price_estimation: bool = False,
+    # upload_dir: Union[str, Path] = None,
+    promptname: str = None,
+    exp_name: str = None,
+    api_key: Optional[str] = None,
+    evaluation_model: str = "gpt-4o-mini",
+    update_interval: float = 1.0,
+):
+    # Use environment variable for API key
+    ptn = f"{str(st.session_state.upfiles_dir)}"
+    outdir = Path(ptn)
+    if exp_name:
+        outdir = outdir / exp_name
+    command = f"python ../varco_arena/main.py -i {ptn} -o {outdir} -k {api_key} -p {promptname} -e {evaluation_model} -j 64"
+    if price_estimation:
+        command = f"{command} -c"
+    else:
+        command = command.replace("python", "yes | python ")
+    print(command)
+    api_key = None  # clear immediately
+    process = subprocess.Popen(
+        command,
+        stdout=subprocess.PIPE,
+        stderr=subprocess.STDOUT,
+        stdin=subprocess.PIPE,
+        text=True,
+        bufsize=1,
+        shell=True,
+    )
+    # Set stdout and stdin to non-blocking mode
+    os.set_blocking(process.stdout.fileno(), False)
+    last_update_time = time.time()
+    terminal_output = st.empty()
+    full_output = f"{command}\n"
+    while True:
+        # Check if we have output to read
+        if select.select([process.stdout], [], [], 0)[0]:
+            output = process.stdout.readline()
+            if output:
+                full_output += output
+                if price_estimation:
+                    to_show = full_output
+                    terminal_output.code(to_show, language="bash")
+                else:
+                    current_time = time.time()
+                    if current_time - last_update_time > update_interval:
+                        lines = full_output.split("\n")
+                        if len(lines) < 5:
+                            to_show = full_output
+                        else:
+                            to_show = "\n".join(["...\n..\n.\n"] + lines[-5:])
+                        terminal_output.code(to_show, language="bash")
+                        last_update_time = current_time
+                print(output)
+            time.sleep(0.1)
+        # Check if the process has finished
+        if process.poll() is not None:
+            # Read any remaining output
+            remaining_output = process.stdout.read()
+            if remaining_output:
+                lines = remaining_output.split("\n")
+                if len(lines) > 10:
+                    to_show += "\n".join(["\n...\n..\n.\n"] + lines[-10:])
+                else:
+                    to_show += remaining_output
+                terminal_output.code(to_show, language="bash")
+                print(remaining_output)
+            break
+    return_code = process.poll()
+    return outdir, return_code
+def main():
+    # init lang
+    st.session_state["korean"] = st.session_state.get("korean", False)
+    sidebar_placeholder = default_page_setting()
+    set_nav_bar(
+        False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_init"
+    )
+    st.title("⚔️ VARCO ARENA ⚔️")
+    if st.session_state.korean:
+        st.write(
+            "**VARCO Arena는 각 모델의 생성된 결과를 비교 평가하여 모델의 성능 순위를 제공하는 시스템입니다. 커스텀 테스트셋에 유용하게 활용할 수 있으며, 모범답안을 필요로 하지 않습니다.**"
+        )
+    else:
+        st.write(
+            "**VARCO Arena is an LLM benchmarking system that compares model responses across customized test scenarios without requiring reference answers.**"
+        )
+    st.divider()
+    # Set up the file uploader
+    if st.session_state.korean:
+        st.markdown("모델 출력파일 업로드")
+    else:
+        st.markdown("### 1. Upload LLM responses")
+    uploaded_files = st.file_uploader(
+        "Drag and Drop jsonlines files (.jsonl)", accept_multiple_files=True
+    )
+    # upload state
+    if "upfiles_dir" not in st.session_state:
+        st.session_state.upfiles_dir = None
+    if st.button("Upload Files"):
+        st.session_state.upfiles_dir = upload_files(uploaded_files)
+        # st.success(st.session_state.upfiles_dir)
+    # st.markdown("**💥주의: 중복된 테스트 시나리오는 오류로 처리됩니다💥**")
+    if st.session_state.korean:
+        with st.expander("❓❔ 무엇을 업로드 하나요❓❔"):
+            st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
+    else:
+        with st.expander("❓❔  What should I upload ❓❔"):
+            st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
+    # Form for cost estimation
+    with st.form("cost_estimation_form"):
+        if st.session_state.korean:
+            st.write("### 2. 가격 산정")
+        else:
+            st.write("### 2. Cost Estimation")
+        eval_model = st.selectbox(
+            "Select Judge",
+            open("eval_models_list.txt", encoding="UTF8").read().split("\n"),
+        )
+        promptname = st.selectbox(
+            "Select Evalutaion Prompt",
+            open("eval_prompt_list.txt", encoding="UTF8").read().split("\n"),
+        )
+        if promptname == USR_SUB:
+            raise ValueError(
+                f"{USR_SUB=} is preserved name for the system. Consider another naming for the prompt or consider changing {VA_ROOT=} (USR_SUB == VA_ROOT.parts[-1])."
+            )
+        estimate_button = st.form_submit_button("Calculate Cost!")
+        with st.expander(
+            "LLM Judge에 활용되는 프롬프트 (`Calculate Cost!` 클릭시 갱신)"
+            if st.session_state.korean
+            else "**Evaluation Prompt for LLM Judge (will refresh after `Calculate Cost!` clicked)**"
+        ):
+            prompt = load_prompt(promptname, task="-")
+            kwargs = dict(
+                inst="{inst}",
+                src="{src}",
+                out_a="{out_a}",
+                out_b="{out_b}",
+                task="-",
+            )
+            if promptname == "translation_pair":
+                kwargs["source_lang"] = "{source_lang}"
+                kwargs["target_lang"] = "{target_lang}"
+            prompt_cmpl = prompt.complete_prompt(**kwargs)
+            st.markdown(f"### Evaluation Prompt: {promptname}")
+            for msg in prompt_cmpl:
+                st.markdown(f"**{msg['role']}**")
+                st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
+        if estimate_button:
+            if st.session_state.get("upfiles_dir") is None:
+                st.error(
+                    "❌ Requirements: You have to upload jsonlines files first to proceed"
+                )
+            else:
+                st.markdown("##### Estimated Cost")
+                dummy_api_key = "dummy"
+                dummy_exp_name = "dummy"
+                result_file_path, return_code = run_varco_arena(
+                    # upload_dir=st.session_state.upfiles_dir,
+                    promptname=promptname,
+                    api_key=dummy_api_key,
+                    exp_name=dummy_exp_name,
+                    price_estimation=True,
+                    evaluation_model=eval_model,
+                )
+                if return_code:
+                    st.error("❌ RuntimeError: An error occurred during cost estimation")
+                else:
+                    st.success("✅ Cost estimation completed successfully")
+                    st.session_state.cost_estimated = True
+    # Form for actual run
+    with st.form("run_arena_form"):
+        if st.session_state.korean:
+            st.write("### 3. Varco Arena 구동하기")
+        else:
+            st.write("### 3. Run Varco Arena")
+        api_key = st.text_input("Enter your OpenAI API Key", type="password")
+        exp_name = st.text_input("(Optional) Enter Exp. name")
+        exp_name = exp_name.replace(
+            "..", "_"
+        )  # May cause rmtree problem later. Block it.
+        exp_name = exp_name.replace(
+            USR_SUB, f"-{USR_SUB}-"
+        )  # May cause rmtree problem later. Block it.
+        exp_name = exp_name.replace("/", "-")
+        exp_name = exp_name.replace(" ", "_")
+        exp_name = exp_name.replace("~", "_")
+        if st.session_state.korean:
+            st.write("**주의**:`Ctrl+C` 버튼은 구현되지 않았습니다. 구동 전 숙고해주세요.")
+        else:
+            st.write("**Caution: `Ctrl+C` button hasn't been implemented.**")
+        run_button = st.form_submit_button(
+            "🔥 Run Arena!",
+            disabled=(not st.session_state.get("cost_estimated", False))
+            or "result_file_path"
+            in st.session_state.keys(),  # run already performed once
+        )
+        if run_button:
+            set_nav_bar(
+                True,
+                sidebar_placeholder=sidebar_placeholder,
+                toggle_hashstr="app_during_run",
+            )
+            if st.session_state.get("upfiles_dir") is None:
+                st.error(
+                    "❌ Requirements: You have to upload jsonlines files first to proceed"
+                )
+            elif not api_key:
+                st.error("❌ Requirements: OpenAI key required to run VA.")
+            else:
+                result_file_path, return_code = run_varco_arena(
+                    # upload_dir=st.session_state.upfiles_dir,
+                    promptname=promptname,
+                    api_key=api_key,
+                    exp_name=exp_name,
+                    price_estimation=False,
+                    evaluation_model=eval_model,
+                )
+                if return_code:
+                    st.error("❌ RuntimeError: An error occurred during Varco Arena run")
+                else:
+                    st.success("✅ Varco Arena run completed successfully")
+                    st.session_state.result_file_path = result_file_path
+    set_nav_bar(
+        False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="app_run_done"
+    )
+if __name__ == "__main__":
+    main()

streamlit_app_local/eval_models_list.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+gpt-4o-mini
+gpt-4o-2024-05-13
+gpt-4o-2024-08-06

streamlit_app_local/eval_prompt_list.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+llmbar
+llmbar_brief
+translation_pair
+rag_pair_kr
+contextual (WIP)

streamlit_app_local/guide_mds ADDED Viewed

	@@ -0,0 +1 @@


1	+ ../guide_mds/

streamlit_app_local/modules/__pycache__/nav.cpython-311.pyc ADDED Viewed

Binary file (2.85 kB). View file

streamlit_app_local/modules/__pycache__/nav.cpython-38.pyc ADDED Viewed

Binary file (889 Bytes). View file

streamlit_app_local/modules/nav.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import streamlit as st
+def Navbar(sidebar_placeholder, toggle_hashstr: str = ""):
+    with sidebar_placeholder.container():
+        st.toggle(
+            "🇰🇷 한국어로 보기",
+            value=st.session_state.get("korean", False),
+            key=f"korean_toggle_{toggle_hashstr}",  # Add this explicit key
+            on_change=lambda: setattr(
+                st.session_state,
+                "korean",
+                st.session_state.get(f"korean_toggle_{toggle_hashstr}", False),
+            ),
+        )
+        st.page_link(
+            "app.py",
+            label="Varco Arena 구동" if st.session_state.korean else "Run VARCO Arena",
+            icon="🔥",
+        )
+        st.page_link(
+            "pages/see_results.py",
+            label="결과 보기" if st.session_state.korean else "See Results",
+            icon="📝",
+            disabled=st.session_state.get("is_running", False),
+        )
+        st.page_link(
+            "pages/brief_intro.py",
+            label="어떻게 작동하나요?" if st.session_state.korean else "How it Works",
+            icon="❔",
+            disabled=st.session_state.get("is_running", False),
+        )
+        related_links_en = """
+**About**
+* [Paper](https://huggingface.co/papers/2411.01281)
+* [Blog (KR)](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
+* [Inquiry](https://linkedin.com/in/deftson/)
+""".strip()
+        related_links_kr = """
+**About**
+* [논문](https://huggingface.co/papers/2411.01281)
+* [블로그](https://ncsoft.github.io/ncresearch/12cc62c1ea0d981971a8923401e8fe6a0f18563d)
+* [문의](https://linkedin.com/in/deftson/)
+""".strip()
+        st.info(related_links_kr if st.session_state.korean else related_links_en)

streamlit_app_local/pages/brief_intro.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import streamlit as st
+from view_utils import default_page_setting, set_nav_bar
+sidebar_placeholder = default_page_setting()
+set_nav_bar(
+    False, sidebar_placeholder=sidebar_placeholder, toggle_hashstr="brief_intro_init"
+)
+st.image("va_en.png")
+if st.session_state.korean:
+    st.info("여러분이 업로드하는 파일은 위 그림의 LLM 출력들입니다.")
+    with st.expander("❓❔ 무엇을 업로드 하나요❓❔"):
+        st.info(open("guide_mds/input_jsonls_kr.md", encoding="UTF8").read())
+else:
+    st.info(
+        "What you upload is the generated responses of LLMs regarding the test prompts."
+    )
+    with st.expander("❓❔  What should I upload ❓❔"):
+        st.info(open("guide_mds/input_jsonls_en.md", encoding="UTF8").read())
+st.image("va_concept_new.png")
+st.markdown(
+    """
+| |Current Practice|Varco Arena|
+|-|-|-|
+|Total no. matches|$$n_{\\text{model}}*\\|X\\|$$|$$(n_{\\text{model}}-1)*\\|X\\|$$|
+|No. matches per LLM|$$\\|X\\|$$|$$\\left[\\|X\\|,\\|X\\|\\text{log}n_{\\text{model}}\\right]$$|
+|Comparison type|reference-mediated (anchored) |direct|
+            """
+)
+if st.session_state.korean:
+    st.info(
+        "Varco Arena는 신뢰성 있는 순위를 더 적은 횟수의 비교 내에 얻어내며, 이러한 특징은 LLM 직접 비교의 이점으로부터 기인합니다."
+    )
+else:
+    st.info(
+        "Varco Arena takes advantage of direct comparison between LLM responses to guarantee better reliability in fewer number of total matches."
+    )

streamlit_app_local/pages/see_results.py ADDED Viewed

	@@ -0,0 +1,391 @@

+import pandas as pd
+import streamlit as st
+from varco_arena_core.prompts import load_prompt
+import analysis_utils as au
+from analysis_utils import number_breakdown_from_df
+from app import VA_ROOT
+from query_comp import QueryWrapper, get_base_url
+from view_utils import (
+    default_page_setting,
+    escape_markdown,
+    set_nav_bar,
+    show_linebreak_in_md,
+    visualization,
+)
+DEFAULT_LAYOUT_DICT = {
+    "title": {"font": {"size": 20, "family": "Gothic A1"}},
+    "font": {"size": 16, "family": "Gothic A1"},
+    "xaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
+    "yaxis": {"tickfont": {"size": 12, "family": "Gothic A1"}},
+    "legend": {"font": {"size": 12, "family": "Gothic A1"}},
+}
+@st.cache_data
+def load_data():
+    all_result_dict = {}
+    df_dict = {}
+    # result_dir_list = os.listdir(VA_ROOT)
+    # result_dir_list = sorted(result_dir_list, key=lambda x: os.stat(os.path.join(VA_ROOT, x)).st_mtime, reverse=True)
+    for f in VA_ROOT.glob("**/result.json"):
+        print(f)
+    for resjsonf in VA_ROOT.glob("**/result.json"):
+        df = pd.read_json(resjsonf)
+        for col in ["tstamp", "logs"]:
+            if col in df.columns:
+                df.drop(columns=[col], inplace=True)
+        df = au.index_test_scenario(df)
+        fig_dict_per_task = {}
+        df_dict_per_task = {}
+        fig_dict_per_task["Overall"] = visualization(df, is_overall=True)
+        df_dict_per_task["Overall"] = df
+        for task in df["task"].unique().tolist():
+            df_task = df[df["task"] == task]
+            fig_dict_per_task[task] = visualization(df_task, is_overall=False)
+            df_dict_per_task[task] = df_task
+        prm_name = resjsonf.parts[-2]
+        exp_name = resjsonf.parts[-3]
+        key = f"{exp_name}/{prm_name}"
+        all_result_dict[key] = fig_dict_per_task
+        df_dict[key] = df_dict_per_task
+    return all_result_dict, df_dict
+def navigate(t, source, key, val):
+    # print(key, val)
+    if source is None:
+        return
+    target_index = t.index(source) + val
+    if 0 <= target_index < len(t):
+        st.session_state[key] = t[target_index]
+        st.rerun()
+def main():
+    sidebar_placeholder = default_page_setting(layout="wide")
+    set_nav_bar(
+        False,
+        sidebar_placeholder=sidebar_placeholder,
+        toggle_hashstr="see_results_init",
+    )
+    # load the data
+    st.session_state["all_result_dict"], st.session_state["df_dict"] = load_data()
+    # side bar
+    st.sidebar.title("Select Result:")
+    result_select = QueryWrapper("expname", "")(
+        st.sidebar.selectbox,
+        list(st.session_state["all_result_dict"].keys()),
+    )
+    if result_select is None:
+        st.stop()
+    eval_prompt_name = result_select.split("/")[-1].strip()
+    if st.sidebar.button("Clear Cache"):
+        st.cache_data.clear()
+        st.cache_resource.clear()
+        st.rerun()
+    if result_select:
+        if "alpha2names" in st.session_state:
+            del st.session_state["alpha2names"]
+    fig_dict_per_task = st.session_state["all_result_dict"][result_select]
+    task_list = list(fig_dict_per_task.keys())
+    elo_rating_by_task = fig_dict_per_task["Overall"]["elo_rating_by_task"]
+    # tabs = st.tabs(task_list)
+    df_dict_per_task = st.session_state["df_dict"][result_select]
+    default_layout_dict = DEFAULT_LAYOUT_DICT
+    task = QueryWrapper("task", "Select Task")(st.selectbox, task_list)
+    if task is None:
+        st.stop()
+    figure_dict = fig_dict_per_task[task]
+    judgename = figure_dict["judgename"]
+    df = df_dict_per_task[task]
+    interpretation, n_models, size_testset = number_breakdown_from_df(df)
+    if st.session_state.korean:
+        st.markdown(f"## 결과 ({task})")
+        st.markdown(f"##### Judge 모델: {judgename} / 평가프롬: {eval_prompt_name}")
+        st.markdown(f"##### 테스트셋 사이즈: {int(size_testset)} 행")
+    else:
+        st.markdown(f"## Results ({task})")
+        st.markdown(f"##### Judge Model: {judgename} / prompt: {eval_prompt_name}")
+        st.markdown(f"##### Size of Testset: {int(size_testset)} rows")
+    col1, col2 = st.columns(2)
+    with col1:
+        with st.container(border=True):
+            st.markdown(f"#### Ratings ({task})")
+            st.table(figure_dict["elo_rating"])
+            st.write(show_linebreak_in_md(escape_markdown(interpretation)))
+            st.markdown(f"**files at:**     \n`{str(VA_ROOT.resolve()/result_select)}`")
+    with col2:
+        with st.container(border=True):
+            st.plotly_chart(
+                elo_rating_by_task.update_layout(**default_layout_dict),
+                use_container_width=True,
+                key=f"{task}_elo_rating_by_task",
+            )
+    st.divider()
+    if st.session_state.korean:
+        st.markdown("### 토너먼트 (테스트 시나리오) 별로 보기")
+    else:
+        st.markdown("### Tournament Results by Test Scenario")
+    # with st.expander("볼 토너먼트 고르기"):
+    d = list(df.idx_inst_src.unique())
+    default_idx = st.session_state.get("selected_tournament", None)
+    cols = st.columns((1, 18, 1))
+    with cols[0]:
+        if st.button("◀", key="prev_tournament"):
+            navigate(d, default_idx, "selected_tournament", -1)
+    with cols[1]:
+        tournament_prm_select = QueryWrapper("tournament", "Select Tournament")(
+            st.selectbox,
+            d,
+            default_idx,
+            key=f"{task}_tournament_select",
+            on_change=lambda: st.session_state.update(
+                selected_tournament=st.session_state.get(f"{task}_tournament_select"),
+                selected_match=None,
+            ),
+            label_visibility="collapsed",
+        )
+    with cols[2]:
+        if st.button("▶", key="next_tournament"):
+            navigate(d, default_idx, "selected_tournament", 1)
+    # tournament_prm_select = st.selectbox(
+    #     "Select Tournament",
+    #     df.idx_inst_src.unique(),
+    #     index=d.index(st.session_state.get("selected_tournament")),
+    #     key=f"{task}_tournament_{result_select}",
+    # )
+    # print(tournament_prm_select, type(tournament_prm_select))
+    st.session_state["selected_tournament"] = tournament_prm_select
+    # tournament_prm_select = st.selectbox(
+    #     "Select Tournament",
+    #     df.idx_inst_src.unique(),
+    #     key=f"{task}_tournament_{result_select}",
+    # )
+    df_now_processed = None
+    if tournament_prm_select:
+        df_now = df[df.idx_inst_src == tournament_prm_select]
+        df_now_processed, _alpha2names = au.init_tournament_dataframe(
+            df_now,
+            alpha2names=st.session_state["alpha2names"]
+            if "alpha2names" in st.session_state.keys()
+            else None,
+        )
+        if "alpha2names" not in st.session_state:
+            st.session_state["alpha2names"] = _alpha2names
+        try:
+            bracket_drawing = au.draw(
+                df_now_processed,
+                alpha2names=st.session_state["alpha2names"],
+            )
+            legend = au.make_legend_str(
+                df_now_processed, st.session_state["alpha2names"]
+            )
+            st.code(bracket_drawing + legend)
+            m = list(df_now_processed.human_readable_idx)
+            default_idx = st.session_state.get("selected_match", None)
+            cols = st.columns((1, 18, 1))
+            with cols[0]:
+                if st.button("◀", key="prev_match"):
+                    navigate(m, default_idx, "selected_match", -1)
+            with cols[1]:
+                match_idx_human = QueryWrapper("match", "Select Match")(
+                    st.selectbox,
+                    m,
+                    default_idx,
+                    key=f"{task}_match_select",
+                    label_visibility="collapsed",
+                )
+            with cols[2]:
+                if st.button("▶", key="next_match"):
+                    navigate(m, default_idx, "selected_match", 1)
+            # match_idx_human = st.selectbox(
+            #     "Select Match",
+            #     df_now_processed.human_readable_idx,
+            #     key=f"{task}_match_{result_select}",
+            # )
+            # print(match_idx_human)
+            st.session_state["selected_match"] = match_idx_human
+            # match_idx_human = st.selectbox(
+            #     "Select Match",
+            #     df_now_processed.human_readable_idx,
+            #     key=f"{task}_match_{result_select}",
+            # )
+            if match_idx_human:
+                match_idx = int(match_idx_human.split(": ")[0])
+                row = df_now_processed.loc[match_idx]
+                st.markdown("#### Current Test Scenario:")
+                with st.expander(
+                    f"### Evaluation Prompt (evalprompt: {eval_prompt_name}--{task})"
+                ):
+                    prompt = load_prompt(eval_prompt_name, task=task)
+                    kwargs = dict(
+                        inst="{inst}",
+                        src="{src}",
+                        out_a="{out_a}",
+                        out_b="{out_b}",
+                        task=task,
+                    )
+                    if eval_prompt_name == "translation_pair":
+                        kwargs["source_lang"] = "{source_lang}"
+                        kwargs["target_lang"] = "{target_lang}"
+                    prompt_cmpl = prompt.complete_prompt(**kwargs)
+                    for msg in prompt_cmpl:
+                        st.markdown(f"**{msg['role']}**")
+                        st.info(show_linebreak_in_md(escape_markdown(msg["content"])))
+                st.info(show_linebreak_in_md(tournament_prm_select))
+                winner = row.winner
+                col1, col2 = st.columns(2)
+                winnerbox = st.success
+                loserbox = st.error
+                with col1:
+                    iswinner = winner == "model_a"
+                    writemsg = winnerbox if iswinner else loserbox
+                    st.markdown(f"#### ({row.model_a}) {row.human_readable_model_a}")
+                    writemsg(
+                        show_linebreak_in_md(row.generated_a),
+                        icon="✅" if iswinner else "❌",
+                    )
+                with col2:
+                    iswinner = winner == "model_b"
+                    writemsg = winnerbox if iswinner else loserbox
+                    st.markdown(f"#### ({row.model_b}) {row.human_readable_model_b}")
+                    writemsg(
+                        show_linebreak_in_md(row.generated_b),
+                        icon="✅" if iswinner else "❌",
+                    )
+        except Exception as e:
+            import traceback
+            traceback.print_exc()
+            st.markdown(
+                "**Bug: 아래 표를 복사해서 이슈로 남겨주시면 개선에 도움이 됩니다. 감사합니다🙏**"
+                if st.session_state.korean
+                else "Bug: Please open issue and attach the table output below to help me out. Thanks in advance.🙏"
+            )
+            st.error(e)
+            st.info(tournament_prm_select)
+            st.table(
+                df_now_processed[
+                    [
+                        "depth",
+                        "round",
+                        "winner_nodes",
+                        "winner_resolved",
+                        "winner",
+                        "model_a",
+                        "model_b",
+                    ]
+                ]
+            )
+    st.write("Sharable link")
+    st.code(f"{get_base_url()}/see_results?{QueryWrapper.get_sharable_link()}")
+    st.divider()
+    if st.session_state.korean:
+        st.markdown("### 매치 통계")
+    else:
+        st.markdown("### Match Stats.")
+    col1, col2 = st.columns(2)
+    col1, col2 = st.columns(2)
+    with col1:
+        with st.container(border=True):
+            st.plotly_chart(
+                figure_dict[
+                    "fraction_of_model_a_wins_for_all_a_vs_b_matches"
+                ].update_layout(autosize=True, **default_layout_dict),
+                use_container_width=True,
+                key=f"{task}_fraction_of_model_a_wins_for_all_a_vs_b_matches",
+            )
+    with col2:
+        with st.container(border=True):
+            st.plotly_chart(
+                figure_dict["match_count_of_each_combination_of_models"].update_layout(
+                    autosize=True, **default_layout_dict
+                ),
+                use_container_width=True,
+                key=f"{task}_match_count_of_each_combination_of_models",
+            )
+    with col1:
+        with st.container(border=True):
+            st.plotly_chart(
+                figure_dict["match_count_for_each_model"].update_layout(
+                    **default_layout_dict
+                ),
+                use_container_width=True,
+                key=f"{task}_match_count_for_each_model",
+            )
+    with col2:
+        pass
+    if st.session_state.korean:
+        st.markdown("### 참고용 LLM Judge 편향 정보")
+    else:
+        st.markdown("### FYI: How biased is your LLM Judge?")
+    with st.expander("펼쳐서 보기" if st.session_state.korean else "Expand to show"):
+        st.info(
+            """
+Varco Arena에서는 position bias의 영향을 최소화하기 위해 모든 모델이 A나 B위치에 번갈아 위치하도록 하였습니다. 그러나 LLM Judge 혹은 Prompt의 성능이 부족하다고 느껴진다면, 아래 알려진 LLM Judge bias가 참고가 될겁니다.
+* position bias (왼쪽)
+* length bias (오른쪽)
+결과의 왜곡이 LLM Judge의 부족함 떄문이었다는 점을 규명하려면 사용하신 LLM Judge와 Prompt의 binary classification 정확도를 측정해보시길 바랍니다 (Varco Arena를 활용하여 이를 수행해볼 수 있습니다!).""".strip()
+            if st.session_state.korean
+            else """
+In Varco Arena, to minimize the effect of position bias, all models are alternately positioned in either position A or B. However, if you feel the LLM Judge or Prompt performance is insufficient, the following known LLM Judge biases may be helpful to reference:
+* position bias (left)
+* length bias (right)
+To determine if result distortion was due to LLM Judge limitations, please measure the binary classification accuracy of your LLM Judge and Prompt (You could use Varco Arena for this purpose!).
+""".strip()
+        )
+        st.markdown(f"#### {judgename} + prompt = {eval_prompt_name}")
+        col1, col2 = st.columns(2)
+        with col1:
+            with st.container(border=True):
+                st.plotly_chart(
+                    figure_dict["counts_of_match_winners"].update_layout(
+                        **default_layout_dict
+                    ),
+                    use_container_width=True,
+                    key=f"{task}_counts_of_match_winners",
+                )
+        with col2:
+            with st.container(border=True):
+                st.plotly_chart(
+                    figure_dict["length_bias"].update_layout(**default_layout_dict),
+                    use_container_width=True,
+                    key=f"{task}_length_bias",
+                )
+                st.table(figure_dict["length_bias_df"].groupby("category").describe().T)
+if __name__ == "__main__":
+    main()

streamlit_app_local/query_comp.py ADDED Viewed

	@@ -0,0 +1,135 @@

+# query를 자동으로 읽고 쓰는 container를 정의
+from __future__ import annotations
+import re
+from typing import Callable, TypeVar
+import streamlit as st
+__all__ = ["QueryWrapper", "get_base_url"]
+T = TypeVar("T")
+import hashlib
+import urllib.parse
+def SHA1(msg: str) -> str:
+    return hashlib.sha1(msg.encode()).hexdigest()[:8]
+def get_base_url():
+    session = st.runtime.get_instance()._session_mgr.list_active_sessions()[0]
+    return urllib.parse.urlunparse(
+        [session.client.request.protocol, session.client.request.host, "", "", "", ""]
+    )
+class QueryWrapper:
+    queries: dict[str, _QueryWrapper] = {}  # 기록용
+    def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
+        self.__wrapper = QueryWrapper.queries[query] = _QueryWrapper(
+            query, label, use_hash
+        )
+    def __call__(self, *args, **kwargs):
+        return self.__wrapper(*args, **kwargs)
+    @classmethod
+    def get_sharable_link(cls):
+        # for k, v in cls.queries.items():
+        #     print(f"{k}: {v}")
+        return re.sub(
+            "&+", "&", "&".join([str(v) for k, v in cls.queries.items()])
+        ).strip("&")
+class _QueryWrapper:
+    ILLEGAL_CHARS = "&/=?"
+    def __init__(self, query: str, label: str | None = None, use_hash: bool = True):
+        self.query = query
+        self.label = label or query
+        self.use_hash = use_hash
+        self.hash_table = {}
+        self.key = None
+    def __call__(
+        self,
+        base_container: Callable,
+        legal_list: list[T],
+        default: T | list[T] | None = None,
+        *,
+        key: str | None = None,
+        **kwargs,
+    ) -> T | list[T] | None:
+        val_from_query = st.query_params.get_all(self.query.lower())
+        # print(val_from_query)
+        legal = len(val_from_query) > 0
+        self.key = key or self.label
+        self.hash_table = {SHA1(str(v)): v for v in legal_list}
+        # filter out illegal values
+        if legal and legal_list:
+            val_from_query = [v for v in val_from_query if v in self.hash_table]
+        # print(self.label, val_from_query, legal)
+        if legal:
+            selected = [self.hash_table[v] for v in val_from_query]
+        elif default:
+            selected = default
+        elif self.label in st.session_state:
+            selected = st.session_state[self.label]
+            if legal_list:
+                if isinstance(selected, list):
+                    selected = [v for v in selected if v in legal_list]
+                elif selected not in legal_list:
+                    selected = []
+        else:
+            selected = []
+        if len(selected) == 1 and base_container in [st.selectbox, st.radio]:
+            selected = selected[0]
+        # print(self.label, selected)
+        if base_container == st.checkbox:
+            selected = base_container(
+                self.label,
+                legal_list,
+                index=legal_list.index(selected) if selected in legal_list else None,
+                key=self.key,
+                **kwargs,
+            )
+        elif base_container == st.multiselect:
+            selected = base_container(
+                self.label, legal_list, default=selected, key=self.key, **kwargs
+            )
+        elif base_container == st.radio:
+            selected = base_container(
+                self.label,
+                legal_list,
+                index=legal_list.index(selected) if selected in legal_list else None,
+                key=self.key,
+                **kwargs,
+            )
+        elif base_container == st.selectbox:
+            selected = base_container(
+                self.label,
+                legal_list,
+                index=legal_list.index(selected) if selected in legal_list else None,
+                key=self.key,
+                **kwargs,
+            )
+        else:
+            selected = base_container(self.label, legal_list, key=self.key, **kwargs)
+        return st.session_state[self.key]
+    def __str__(self):
+        selected = st.session_state.get(self.key, None)
+        if isinstance(selected, str):
+            return f"{self.query.lower()}={SHA1(selected)}"
+        elif isinstance(selected, list):
+            return "&".join([f"{self.query.lower()}={SHA1(str(v))}" for v in selected])
+        else:
+            return ""

streamlit_app_local/requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+streamlit
+plotly
+scikit-learn
+numpy
+pandas

streamlit_app_local/run.sh ADDED Viewed

	@@ -0,0 +1,3 @@

+#!/bin/bash
+# 여기서 돌려야해요 (cwd = ./streamlit_app_components/)
+VARCO_ARENA_RESULT_PATH=./user_submit streamlit run app.py

streamlit_app_local/user_submit/12-02_14:29:30/1.jsonl ADDED Viewed

	@@ -0,0 +1,6 @@

+{"source":"D: 길드연구 추천 마크 했으니\nD: 다들 누르거라\nC: 길드장님 위치가 어디에요?\nC: 찾았음...\nD: 네 길드 태그 붙였으니까\nD: 그쪽으로 성 이동 하시고\nD: 병력 넣으라고 날래날래\nA: 하이하이\nD: 반갑습니다 어서오십셔\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 이리로 이동해서 오세요\nC: 네네~~~~\nA: 함께해요 ~\nC: 옮겼습니다.\nA: ♡\nD: 글루톤을 많이많이 잡아주셔야\nD: 시타델의 자원이 넘쳐납니다.\nA: 임모탄님\nA: 말투가 너무 나긋나긋하십니다\nA: 체통을 지켜주시옵소서\nD: 너나 잘해\nA: 케켘\nD: 사람 좀 모이면\nD: 파수꾼 잡아여\nB: 공격대 참여 많이 해주세요~!\nD: 공격대 참여해야 정말 자원 넘치는 시타델이 됩니다.\nD: 여러분 퀘스트하시면서 진행하시면 빠른 성장 가능합니다.\nD: 여러분 영주 명예 보상도 받으세요~\nD: 영주 명예 레벨  3 되서 보상 받으면\nA: 넵 임모탄님!!\nD: 건설슬롯 2개!\nA: 글루톤에 가속이 그득그득 들었네요\nA: 한부대씩 공격대 걸어주세요\nA: 헬로우 워보이\nD: 어서오세요~ :)\nA: 친절한 모탄씨\nD: 예쓰\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 헬로우 워뽀이\nD: 길드 요새 근처로 이동해 오세요\nA: 글루톤 참여하시면\nA: 시타델도 풍족해지고 가속도 넘쳐흐르게 됩니다요\nA: 10렙짜리\nA: 강화몬스터 같이 잡으실분\nA: 여기로 부대 이동해주세요 최대한\nA: ShareCoord#빈 땅#145926.609375#523510.437500\nD: 어서오세요\nA: 커몬커몬 워보이\nD: 10레[벨 정예몹 같이 잡아용\nA: 10레벨 강화랩토루스 한번 모여서 잡아볼까요","generated":"* D이 길드 연구 추천을 마크했으니 다들 누르라고 하였습니다.\n* D이 길드 태그 붙였으니 그 쪽으로 성 이동 하고 병력을 날래날래 넣으라고 하였습니다.\n* D이 글루톤을 많이많이 잡아야 시타델의 자원이 넘처난다고 강조했습니다.\n* A이 임모탄님의 말투가 너무 나긋나긋하니 체통을 지켜달라고 하자 A이 너나 잘해라고 하였습니다.\n* D은 사람 좀 모이면 파수꾼을 잡자고 하였습니다.\n* D은 영주 명예 레벨이 3 되어서 보상으로 건설슬롯 2개를 받을 수 있다고 하였습니다.\n* A이 글루톤에 가속이 그득그득 들었으니 한 부대씩 공격대를 걸어달라고 요청했습니다.\n* D와 A이 10레벨 강화랩토루스를 모여서 잡아보자고 하였습니다","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","how_prompted":""}
+{"source":"A: 부대 남으시는분\nA: 8282\nA: 자 쳐볼게요\nA: 화이팅\nA: 발할라로~!!!\nA: 기억해줘!!! 정예 10레벨 잡았다구!!!\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nA: 남는 부대 1개씩\nA: 길드 요새 건설에 넣어주세요!\nA: 우리의 시타델을 건설해야합니다!\nC: 웰컴 투 시타델\nA: 헬로우 워뽀이\nC: 모두 1부대는 길드 요새에 넣어달라입니다.\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nC: 시타델의 영토가 시급하다예요\nA: 이리로 오세요 어서~!\nB: 우리 길드요새 빨리 건설 할려면 병력을 더 많이 넣어야 하니까 어서어서 지원해주세요~ ^^\nA: 빨간내복최고~!!\nC: 어서오세요~\nC: ㅎㅎㅎㅎ\nA: 우와\nC: 워터보이님 어서오십셔\nA: 워(터)보이 오셨다\nC: 다들 날래날래 연매용새에 기사단 하나씩 넣으라우\nA: 자 우리 이제\nA: 경험치 3배 이벤트\nA: 달려야합니다\nA: 토스트님 어서오세요~ ㅎㅎ\nA: ShareCoord#빈 땅#155862.171875#524460.062500\nA: 여기로 병력집결해주세요\nA: 오른쪽에 병력슬롯 두번누르면\nA: 전체선택해서 여러 기사단을 한번에 운용할 수 있습니다\nA: 자자 요새 오른쪽에\nA: 모이고 있습니다\nA: 경험치 3배 이벤트\nD: 계속 이동할 수 없는 위치라고 하는데 어떻게 하면 되나요?\nA: 참여해서 기사 레벨 쭉쭉 올리셔요\nA: 그 성을 이리저리옮기면\nA: 초록색으로 성 바닥이 바뀌는데\nA: 그때 이동누르시면 됩니다\nA: 잘 안되면 2~3번정도 트라이해주시면 이동될거에요\nA: ShareCoord#빈 땅#156695.468750#525127.875000\nD: 아...깃발이 떠있어서 눌러봤는데 못간다고 해서 ㅎㅎ;\nA: 아~~\nA: 깃발떠있는곳으로 기사단(부대)는 이동할 수 있어요!\nA: 부대 여러개 보내서 요기 옆에 파수꾼 잡으면\nA: 경험치 폭탄 팡팡 터집니다\nA: 어서오세요\nD: 장애물이 있어서 못간다고 하네요..\nA: ShareCoord#빈 땅#156578.359375#523799.718750","generated":"* C가 시타델의 영토가 시급하니 모두 1부대는 길드 요새에 넣어달라고 하였습니다.\n* A이 요기 옆에 파수꾼을 잡으면 경험치 폭탄 팡팡 터지니 어서 오라고 하였습니다.\n* A이 경험치 3배 이벤트를 달려야 하니 빈 땅으로 병력을 집결해달라고 하였습니다.\n* A이 부대 여러 개 보내서 요기 옆에 파수꾼 잡으면 경험치 폭탄 팡팡 터진다고 하였습니다.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","how_prompted":""}
+{"source":"A: 10시 10분에\nA: 파수꾼사냥 시작할게요\nA: 파수꾼에 경험치 많이 들어있습니다\nC: 10시 10분에 파수꾼 잡습니다.\nC: 다들 오세요\nE: 어디인가요..?\nA: ShareCoord#빈 땅#156509.859375#524728.187500\nA: 요기가 부대 집결지고요!\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 길드요새 근처로 성 이동하신 후에 부대 옮기시면 편합니다\nD: 공격대를 그냥 보내면 되나요?\nA: 기사단을 보내두셨다가\nA: 시작하면 부대를 끌어서 몬스터 위에 커서를 올려두면\nA: 달려가서 공격하는데요\nA: 한타이밍에 같이 잡으면 전투에 참여한 사람한테 보상을 균등하게 줘서\nA: 다같이 몰아잡는게 이득입니다\nA: 10분되어서 파수꾼 한번 달려볼게요~!\nA: 부대 최대한 꺼내서 가져오신 후에\nA: 우측 부대 슬롯 버튼 더블클릭해서\nA: 3~5부대 운용하셔요~!\nA: 저 보시면 5부대 전부\nA: 한방에 운용합니다\nA: 자자 칠게요 파수꾼!!\nA: ShareCoord#파수꾼#155425.500000#527495.250000\nC: 고 합니다.\nA: 공격공격\nA: ShareCoord#파수꾼#159173.750000#528246.125000\nA: 멋쟁이님 커비님\nA: 부대 움직여서 파수꾼 공격하셔요~~!\nB: 같이 가유~~~ 파수꾼\nA: ShareCoord#파수꾼#160192.187500#525461.625000\nA: 어서오세요 ~~\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 성 이동하시구\nA: 파수꾼잡으러 참여하셔요 요쪽으루 '\nA: ShareCoord#파수꾼#165078.687500#525488.375000\nA: 같이 잡아야 병력도 덜죽고 경험치도 같이먹어요~!\nA: 어서오세요오~~~~\nB: 어서오세요~\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 성 이동 좌표\nA: ShareCoord#파수꾼#164864.437500#527752.750000\nC: 어서오세요~ :)\nA: 몬스터 좌표\nA: 계속 파수꾼 진행 중입니더~!!\nA: 어서와서 잡으셔용~~\nA: 부대 회군해서\nA: 정비 한버 ㄴ하고\nA: 다시 달릴게요\nA: 다들 부대 체력이 간당간당하실듯하여..","generated":"* A이 10시 10분에 파수꾼 사냥을 시작한다고 하였습니다.\n* D가 공격대를 그냥 보내면 되냐고 묻자 A은 다같이 몰아잡는게 이득이니 10분 되어서 파수꾼을 달려보자고 하였습니다.\n* A은 부대를 최대한 꺼내서 가져온 후에 우측 부대 슬롯 버튼 더블클릭 해서 3~5부대 운용하라고 하였습니다.\n* A이 파수꾼은 같이 잡아야 병력도 덜 죽고 경험치도 같이 먹을 수 있으니 어서오라고 하였습니다.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","how_prompted":""}
+{"source":"C: 요거 잡고 스탑 잠시 정비하고 오겠습니다.\nA: ShareCoord#빈 땅#159195.468750#530362.062500\nA: 여기 집결할게요!!!\nA: 18분에 스타트 하겠습니다\nA: 부대 회군했다가 다중 편성으로 전부 끌고 나오셔용~!\nA: 다같이 잡아야 경험치 함께먹고 폭업 가능합니다\nA: 어서어서~!! 워뽀이들 빨리 강해져야 나중에 싸울때 재밌습니다~! ㅎㅎㅎ\nA: ShareCoord#빈 땅#159809.593750#529345.625000\nA: 이동시간때문에 20분에 바로 시작할게요\nA: 10시 20분엔 그냥 바로 스타트합니답\nA: 와주셔서 감사합니다 ㅎㅎ 헤헤\nA: 자 칠게요\nA: 남은거 다잡겠습니다\nD: 가나요?\nA: gogo!!!!\nA: ShareCoord#빈 땅#159518.640625#532655.562500\nA: ShareCoord#파수꾼#157515.046875#534659.812500\nA: gogo!!!\nA: ShareCoord#에테르 원석#160185.000000#531312.000000\nA: 파수꾼잡고떨어진\nA: 에테르 원석 이거 시간제 버프같으니까\nB: 건물 짓거나 병사 뽑을 때 손모양 도움 누르면 시간이 빨라집니다.\nA: 필요하신거 빠르게 채집해서 버프받으세요\nB: 적극적으로 활용해주세요\nC: 파수꾼 사냥 수고하셨습니다.\nC: 에테르 원석 활용해주세요! 인터렉션하면 버프 받습니다.\nC: 다시 글루톤 사냥 갑시당\nC: 길드 연구도 시작했으니\nC: 기부 많이 해주세요\nC: 기부를 하면 개인 포인트도 얻으니까 상부상조\nA: 오후쯤이면 쟁한��� 하지 싶네요\nC: 다들 발할라 가자\nA: 성장 호다닥 해주시구 궁금한거 물어봐주세요\nA: 도움 파파팍 눌러주시구욥~!\nA: 강한 워뽀이가 되자!!!!!!!\nB: 길드 요새 60% 넘었네요.\nB: 영차 영차 영차~\nA: 어우 달엉 도움 감사합니다\nC: ㅎㅎㅎㅎ\nD: ㅋㅋㅋㅋㅋ\nA: 도움도움~!!! 최고~!!!!!!\nA: 가속 아끼지 마시구 팍팍쓰세요\nA: 마지막날까지 가속 다 쓰지도 못합니다 ㄷㄷ\nA: 도움-가속-도움-가속\nA: 영-차-영-차\nC: (흡족)\nC: 시타델 요새 완성까지 3분남았습니다.\nA: 워뽀이 여러분들\nA: 궁금하거나 막히는 것들 말씀주시면\nA: 다 알려드리고 다 도와드립니다","generated":"* A이 파수꾼을 잡고 떨어진 에테르 원석은 시간제 버프로 건물 짓거나 병사 뽑을 때 손 모양 도움을 누르면 시간이 빨라지니 필요한 것을 빠르게 채집해 버프받으라고 하였습니다.\n* C가 길드 연구에 기부하면 개인 포인트도 얻을 수 있으니 기부를 많이 해달라고 당부하였습니다.\n* A이 오후 쯤 쟁 한번 하지 싶으니 성장 호다닥 하고 궁금한 것 물어봐달라고 하였습니다.\n* B이 길드 요새가 60%를 넘었다고 하였습니다.\n* A이 궁금하거나 막히는 것들 말씀 주시면 다 알려드리고 도와드린다고 하였습니다.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","how_prompted":""}
+{"source":"E: 여러분!\nE: 글루톤 많이 잡아주세요!\nE: 길드 자원이 부 족합니다.\nE: 젖과꿀이 흐르는 시타델이 되고 싶다?\nE: 글루톤을 많이 잡아주십쇼\nG: 가난한 시타델은 잇을 수 없습니다\nE: 플리즈\nB: 강화몬스터 미션 걸리신분?\nB: 같이 잡으실분 계신가요 ~\nF: 좌표주세요\nB: ShareCoord#빈 땅#155225.781250#534257.875000\nB: ShareCoord#빈 땅#154322.968750#533640.812500\nB: 강화몬스터 잡으실분 구해요~~~\nB: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nB: 출발합니다\nB: 강화몬스터~!\nB: 나이스커비님\nF: ㅎㅎ\nB: 감사합니닷\nF: 굿굿\nB: 너른 벌판을 뛰놀고 싶은\nC: 길드 연구 기부 부탁드립니다.\nB: 워보이의 마음을 아는가\nC: 길드 보시면 길드 연구 있어요~\nB: ShareCoord#1레벨 드래곤 레어#138900.000000#535800.000000\nB: 전초기지 짓고 드래곤 알 한번 얻어볼게요~~\nB: 알을 얻고 좀 키워야 용쓸 수 있습니다~~~!!\nC: 전초기지 건설에 기사단 1개씩 지원 부탁드립니다\nE: 글루톤도 많이 잡아주세요\nE: 길드 자원을 모아야 합니당\nB: 넵 임모탄님\nB: 충성충성충성 XXXXXXXXXXX\nE: 가즈아 발할라!\nB: 활동 포인트 얻는 법\nB: 1. 길드 > 연구 > 길드 연구 기부(추천 연구)\nB: 2. 길드 본부 > 도움\nB: 활동 포인트 사용법\nB: 1. 길드 > 길드 상점 > 상품 구매(기사 경험치 꿀)\nB: 기사렙업빨리해서 괴쎈 몬스터 빨리잡고 고성장합세다~!!\nB: 도움 고맙습니다 헤헤\nB: 곧 식사시간일텐데요\nB: 전초기지에 기사단 1개 넣어주시구\nB: 나머지 부대는 2레벨 이상 채집지에 넣어두신 후 식사다녀오시면\nB: 이벤트도 클리어되고 자원도 빵빵해집니다~~~\nB: 안녕하세요~~~~\nB: 궁금한거 물어보셔요~~~ 다 알려드립니다 ~~\nD: 우리 전초기지 3분 남았는데.\nA: 정찰은 어떻게 보내나요\nD: 좀만 더 영차영차 해주시면\nD: 남의 성 찍고 정찰 버튼 클릭하니까 정찰 보내졌어요.","generated":"* E가 젖과꿀이 흐르는 시타델이 되고 싶다면 글루톤을 많이 잡아달라고 부탁했습니다.\n* B이 강화몬스터 미션 걸리신 분 같이 잡자고 하였습니다.\n* B이 전초기지를 짓고 드래곤 알을 얻어보겠다고 하였습니다.\n* E가 길드 자원을 모아야 한다고 하였습니다.\n* B 활동 포인트 얻는 법과 활동 포인트 사용법을 공유하였습니다.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","how_prompted":""}
+{"source":"D: 보호막있는 성 빼고\nD: 다른 사람이 채집하는 곳도 정찰 되던데요.\nC: 정찰은 마을이나 타인의 성을 누르면 정찰 되는데요\nE: 정찰은 영지에 정찰 본부를 건설해야합니다\nC: ShareCoord#2레벨 프로잔 마을#148500.000000#524397.000000\nC: 미션떄때문이시면 이 마을이나\nA: 오 파랑새 날아간다\nC: 오 되셨나요\nA: ㅎㅎ 네 ㄳㄳ\nD: 파랑새...\nC: 사실부엉이입니다 ㅋㅋ\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nA: 딱다구리라고 해도 믿을 ��둥이 길이지만 그렇다고 하니 믿겠습니다\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nA: 판게아. 그 신비의 세계\nC: 입이 삐죽 나왔어요\nC: 삐진듯요\nA: 다신 정찰 안보낼게 ㄷ\nE: ㅎㅎㅎㅎㅎㅎㅎㅎ\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nD: 끝\nC: 군주님 계신가요\nC: 스트릿 군주님\nC: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nC: 요새 근처로 오시면 글뤁루톤 공격대 집결이 훨씬 용이합니다!!\nC: 이쪽으로 오셔용~!\nC: 좌표 근처 땅을 꾸욱 누르신 후 성 이동을 누르시면 성이 이전됩니다\nE: 1시에 파수꾼 사냥 갑니다.\nE: 태그 달았으니 위치 확인해주세요 :)\nB: 안녕하세요 가입인사 드립니다\nE: 안녕하세요! 반갑습니당 :)\nC: 안녕하세요~~~\nC: 1시 파수꾼 확인했습니다\nE: 예압\nF: 안녕하세요 반갑습니다\nC: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nE: 어서오십쇼!\nE: 젖과 꿀이 흐르는 시타델입니다.\nE: 길드 요새 근처로 성 이동 해주세요! :)\nD: 네~~~~~~\nC: ShareCoord#[STOP]전초기지#-152190.000000#492921.000000\nC: 우리도 요기처럼 싸우게 될지도 몰라요 ㅎㅎ\nC: 무력을 빨리 키워둡시다~!!\nC: 헐!\nC: 커비님 혹시 시타델 버리고 가시나요\nC: ShareCoord#[CITA]남의커비#-155181.000000#523710.000000\nC: 들켰으니 복귀하시죵\nC: ㅎㅎㅎㅎㅎㅎㅎ\nC: 탐사 준비하세요\nC: ShareCoord#[INSD]불안이#-148347.000000#488490.000000","generated":"* D이 다른 사람이 채집하는 곳도 정찰이 가능하다고 하였습니다.\n* C이 싸우게 될지 모르니 무력을 빨리 키워두자고 제안했습니다.\n* C이 커비님에게 시타델 버리고 가시냐며 들켰으니 복귀하라고 제안했습니다.\n* C이 탐사를 준비하자고 하였습니다.","model_id":"human+alphabet+bullet","task":"tactan_chat","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","how_prompted":""}

streamlit_app_local/user_submit/12-02_14:29:30/2.jsonl ADDED Viewed

	@@ -0,0 +1,6 @@

+{"generated":"* D길드장이 길드연구 태그를 추천하며, C가 입장했을 때 위치를 안내합니다.\n* A가 D길드장에게 인사를 건네고, C가 길드로 이동합니다.\n* D길드장은 글루톤 잡는것을 독려하며 시타델 자원 풍부해짐을 안내합니다.\n* B, D 공격대 참여를 독려하고 퀘스트 진행을 권장하며 영주 명예 보상의 존재를 설명합니다.\n* A는 글루톤에 가속이 풍부하다는 점을 강조하고, 부대 이동을 안내하며, 웜보이와 그의 친구를 맞이합니다.\n* D길드장은 10레벨 강화몬스터(랩토루스)를 함께 잡을 것을 제안합니다.","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","source":"D: 길드연구 추천 마크 했으니\nD: 다들 누르거라\nC: 길드장님 위치가 어디에요?\nC: 찾았음...\nD: 네 길드 태그 붙였으니까\nD: 그쪽으로 성 이동 하시고\nD: 병력 넣으라고 날래날래\nA: 하이하이\nD: 반갑습니다 어서오십셔\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 이리로 이동해서 오세요\nC: 네네~~~~\nA: 함께해요 ~\nC: 옮겼습니다.\nA: ♡\nD: 글루톤을 많이많이 잡아주셔야\nD: 시타델의 자원이 넘쳐납니다.\nA: 임모탄님\nA: 말투가 너무 나긋나긋하십니다\nA: 체통을 지켜주시옵소서\nD: 너나 잘해\nA: 케켘\nD: 사람 좀 모이면\nD: 파수꾼 잡아여\nB: 공격대 참여 많이 해주세요~!\nD: 공격대 참여해야 정말 자원 넘치는 시타델이 됩니다.\nD: 여러분 퀘스트하시면서 진행하시면 빠른 성장 가능합니다.\nD: 여러분 영주 명예 보상도 받으세요~\nD: 영주 명예 레벨  3 되서 보상 받으면\nA: 넵 임모탄님!!\nD: 건설슬롯 2개!\nA: 글루톤에 가속이 그득그득 들었네요\nA: 한부대씩 공격대 걸어주세요\nA: 헬로우 워보이\nD: 어서오세요~ :)\nA: 친절한 모탄씨\nD: 예쓰\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 헬로우 워뽀이\nD: 길드 요새 근처로 이동해 오세요\nA: 글루톤 참여하시면\nA: 시타델도 풍족해지고 가속도 넘쳐흐르게 됩니다요\nA: 10렙짜리\nA: 강화몬스터 같이 잡으실분\nA: 여기로 부대 이동해주세요 최대한\nA: ShareCoord#빈 땅#145926.609375#523510.437500\nD: 어서오세요\nA: 커몬커몬 워보이\nD: 10레[벨 정예몹 같이 잡아용\nA: 10레벨 강화랩토루스 한번 모여서 잡아볼까요","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
+{"generated":"* A는 길드가 건설하는 시타델 기념 첫 건축물인 길드 요새로 부대를 이동시켜달라고 합니다. \n* 시타델 건설을 축하하며 모두가 참여하도록 유도합니다.\n* 다음으로 경험치 3배 이벤트를 알려주고 \n   * 참여 방법을 설명하며 더 쉽게 이해하도록 도와줍니다.\n* 람층이 좀 낯설어 보이는 D에게 이벤트 지역으로 이동하려면 깃발 옆으로 부대를 보내도록 안내하고, \n* 경험치를 빠르게 얻기 위해 파수꾼을 잡으면 이벤트 효율이 높아진다고 설명합니다.","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","source":"A: 부대 남으시는분\nA: 8282\nA: 자 쳐볼게요\nA: 화이팅\nA: 발할라로~!!!\nA: 기억해줘!!! 정예 10레벨 잡았다구!!!\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nA: 남는 부대 1개씩\nA: 길드 요새 건설에 넣어주세요!\nA: 우리의 시타델을 건설해야합니다!\nC: 웰컴 투 시타델\nA: 헬로우 워뽀이\nC: 모두 1부대는 길드 요새에 넣어달라입니다.\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nC: 시타델의 영토가 시급하다예요\nA: 이리로 오세요 어서~!\nB: 우리 길드요새 빨리 건설 할려면 병력을 더 많이 넣어야 하니까 어서어서 지원해주세요~ ^^\nA: 빨간내복최고~!!\nC: 어서오세요~\nC: ㅎㅎㅎㅎ\nA: 우와\nC: 워터보이님 어서오십셔\nA: 워(터)보이 오셨다\nC: 다들 날래날래 연매용새에 기사단 하나씩 넣으라우\nA: 자 우리 이제\nA: 경험치 3배 이벤트\nA: 달려야합니다\nA: 토스트님 어서오세요~ ㅎㅎ\nA: ShareCoord#빈 땅#155862.171875#524460.062500\nA: 여기로 병력집결해주세요\nA: 오른쪽에 병력슬롯 두번누르면\nA: 전체선택해서 여러 기사단을 한번에 운용할 수 있습니다\nA: 자자 요새 오른쪽에\nA: 모이고 있습니다\nA: 경험치 3배 이벤트\nD: 계속 이동할 수 없는 위치라고 하는데 어떻게 하면 되나요?\nA: 참여해서 기사 레벨 쭉쭉 올리셔요\nA: 그 성을 이리저리옮기면\nA: 초록색으로 성 바닥이 바뀌는데\nA: 그때 이동누르시면 됩니다\nA: 잘 안되면 2~3번정도 트라이해주시면 이동될거에요\nA: ShareCoord#빈 땅#156695.468750#525127.875000\nD: 아...깃발이 떠있어서 눌러봤는데 못간다고 해서 ㅎㅎ;\nA: 아~~\nA: 깃발떠있는곳으로 기사단(부대)는 이동할 수 있어요!\nA: 부대 여러개 보내서 요기 옆에 파수꾼 잡으면\nA: 경험치 폭탄 팡팡 터집니다\nA: 어서오세요\nD: 장애물이 있어서 못간다고 하네요..\nA: ShareCoord#빈 땅#156578.359375#523799.718750","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
+{"generated":"* A는 길드에서 파수꾼 사냥 이벤트를 10시 10분에 시작한다고 알리고 사냥 장소 위치 정보를 공유합니다. \n* 초보자를 위한 안내로, 길드 요새 근처로 이동한 후 부대를 이동시켜 몬스터 위에 커서를 놓고 공격하는 전략을 설명합니다. 공동전투에 참여하면 경험치가 균등하게 나눠집니다.\n* A는 플레이어들이 부대를 운용하고 파수꾼을 공격할 준비를 하도록 지시합니다.\n* 파수꾼 위치를 계속 공유하며 부대가 없어지면 회수 후 다시 진행한다고 합니다.","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","source":"A: 10시 10분에\nA: 파수꾼사냥 시작할게요\nA: 파수꾼에 경험치 많이 들어있습니다\nC: 10시 10분에 파수꾼 잡습니다.\nC: 다들 오세요\nE: 어디인가요..?\nA: ShareCoord#빈 땅#156509.859375#524728.187500\nA: 요기가 부대 집결지고요!\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 길드요새 근처로 성 이동하신 후에 부대 옮기시면 편합니다\nD: 공격대를 그냥 보내면 되나요?\nA: 기사단을 보내두셨다가\nA: 시작하면 부대를 끌어서 몬스터 위에 커서를 올려두면\nA: 달려가서 공격하는데요\nA: 한타이밍에 같이 잡으면 전투에 참여한 사람한테 보상을 균등하게 줘서\nA: 다같이 몰아잡는게 이득입니다\nA: 10분되어서 파수꾼 한번 달려볼게요~!\nA: 부대 최대한 꺼내서 가져오신 후에\nA: 우측 부대 슬롯 버튼 더블클릭해서\nA: 3~5부대 운용하셔요~!\nA: 저 보시면 5부대 전부\nA: 한방에 운용합니다\nA: 자자 칠게요 파수꾼!!\nA: ShareCoord#파수꾼#155425.500000#527495.250000\nC: 고 합니다.\nA: 공격공격\nA: ShareCoord#파수꾼#159173.750000#528246.125000\nA: 멋쟁이님 커비님\nA: 부대 움직여서 파수꾼 공격하셔요~~!\nB: 같이 가유~~~ 파수꾼\nA: ShareCoord#파수꾼#160192.187500#525461.625000\nA: 어서오세요 ~~\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 성 이동하시구\nA: 파수꾼잡으러 참여하셔요 요쪽으루 '\nA: ShareCoord#파수꾼#165078.687500#525488.375000\nA: 같이 잡아야 병력도 덜죽고 경험치도 같이먹어요~!\nA: 어서오세요오~~~~\nB: 어서오세요~\nA: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nA: 성 이동 좌표\nA: ShareCoord#파수꾼#164864.437500#527752.750000\nC: 어서오세요~ :)\nA: 몬스터 좌표\nA: 계속 파수꾼 진행 중입니더~!!\nA: 어서와서 잡으셔용~~\nA: 부대 회군해서\nA: 정비 한버 ㄴ하고\nA: 다시 달릴게요\nA: 다들 부대 체력이 간당간당하실듯하여..","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
+{"generated":"* A는 딜레이 페이지 카운트다운을 시작하기 전에 플레이어들을 모아놓고 준비 상황을 안내합니다. \n* 파수꾼과 에테르 원석 출현 위치를 공유하고, \n파수꾼을 잡고 에테르 원석을 활용하면 시간 제한 버프를 얻을 수 있음을 알립니다.\n* '도움' & '가속' 도구를 적극적으로 사용하라는 조언도 덧붙이고, C가 시타델 건설이 3분 남았다고 알리며 플레이어가 조금씩 역할을 마치고 상황이 진행되는 모습을 보여줍니다.\n* 플레이어들은 서로 응원하고, 게임 진행 상황을 나누며 갈 길을 계획합니다.","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","source":"C: 요거 잡고 스탑 잠시 정비하고 오겠습니다.\nA: ShareCoord#빈 땅#159195.468750#530362.062500\nA: 여기 집결할게요!!!\nA: 18분에 스타트 하겠습니다\nA: 부대 회군했다가 다중 편성으로 전부 끌고 나오셔용~!\nA: 다같이 잡아야 경험치 함께먹고 폭업 가능합니다\nA: 어서어서~!! 워뽀이들 빨리 강해져야 나중에 싸울때 재밌습니다~! ㅎㅎㅎ\nA: ShareCoord#빈 땅#159809.593750#529345.625000\nA: 이동시간때문에 20분에 바로 시작할게요\nA: 10시 20분엔 그냥 바로 스타트합니답\nA: 와주셔서 감사합니다 ㅎㅎ 헤헤\nA: 자 칠게요\nA: 남은거 다잡겠습니다\nD: 가나요?\nA: gogo!!!!\nA: ShareCoord#빈 땅#159518.640625#532655.562500\nA: ShareCoord#파수꾼#157515.046875#534659.812500\nA: gogo!!!\nA: ShareCoord#에테르 원석#160185.000000#531312.000000\nA: 파수꾼잡고떨어진\nA: 에테르 원석 이거 시간제 버프같으니까\nB: 건물 짓거나 병사 뽑을 때 손모양 도움 누르면 시간이 빨라집니다.\nA: 필요하신거 빠르게 채집해서 버프받으세요\nB: 적극적으로 활용해주세요\nC: 파수꾼 사냥 수고하셨습니다.\nC: 에테르 원석 활용해주세요! 인터렉션하면 버프 받습니다.\nC: 다시 글루톤 사냥 갑시당\nC: 길드 연구도 시작했으니\nC: 기부 많이 해주세요\nC: 기부를 하면 개인 포인트도 얻으니까 상부상조\nA: 오후쯤이면 쟁한번 하지 싶네요\nC: 다들 발할라 가자\nA: 성장 호다닥 해주시구 궁금한거 물어봐주세요\nA: 도움 파파팍 눌러주시구욥~!\nA: 강한 워뽀이가 되자!!!!!!!\nB: 길드 요새 60% 넘었네요.\nB: 영차 영차 영차~\nA: 어우 달엉 도움 감사합니다\nC: ㅎㅎㅎㅎ\nD: ㅋㅋㅋㅋㅋ\nA: 도움도움~!!! 최고~!!!!!!\nA: 가속 아끼지 마시구 팍팍쓰세요\nA: 마지막날까지 가속 다 쓰지도 못합니다 ㄷㄷ\nA: 도움-가속-도움-가속\nA: 영-차-영-차\nC: (흡족)\nC: 시타델 요새 완성까지 3분남았습니다.\nA: 워뽀이 여러분들\nA: 궁금하거나 막히는 것들 말씀주시면\nA: 다 알려드리고 다 도와드립니다","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
+{"generated":"* E는 길드 자원이 부족하다는 이유로 글루톤을 많이 잡아달라고 합니다. \n* B는 강화몬스터 미션을 하고 있으며 몇몇 사람들을 모집합니다. B는 다른 활동을 안내하며 F와 함께 미션을 완료합니다.\n* C는 길드 연구 기부를 부탁하며 전초기지 건설을 지원하기를 요청합니다. B는 C의 요청을 받아들여 전초기지 건설에 기사단을 배치하고 2레벨 이상의 부대들을 채집지로 보내 자원 확보를 위한 계획을 설명합니다.\n* D는 전초기지가 곧 완성이 될 것이라고 말하며, 다른 성의 정찰을 어떻게 보낸지에 대해 A에게 문의합니다.","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","source":"E: 여러분!\nE: 글루톤 많이 잡아주세요!\nE: 길드 자원이 부 족합니다.\nE: 젖과꿀이 흐르는 시타델이 되고 싶다?\nE: 글루톤을 많이 잡아주십쇼\nG: 가난한 시타델은 잇을 수 없습니다\nE: 플리즈\nB: 강화몬스터 미션 걸리신분?\nB: 같이 잡으실분 계신가요 ~\nF: 좌표주세요\nB: ShareCoord#빈 땅#155225.781250#534257.875000\nB: ShareCoord#빈 땅#154322.968750#533640.812500\nB: 강화몬스터 잡으실분 구해요~~~\nB: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nB: 출발합니다\nB: 강화몬스터~!\nB: 나이스커비님\nF: ㅎㅎ\nB: 감사합니닷\nF: 굿굿\nB: 너른 벌판을 뛰놀고 싶은\nC: 길드 연구 기부 부탁드립니다.\nB: 워보이의 마음을 아는가\nC: 길드 보시면 길드 연구 있어요~\nB: ShareCoord#1레벨 드래곤 레어#138900.000000#535800.000000\nB: 전초기지 짓고 드래곤 알 한번 얻어볼게요~~\nB: 알을 얻고 좀 키워야 용쓸 수 있습니다~~~!!\nC: 전초기지 건설에 기사단 1개씩 지원 부탁드립니다\nE: 글루톤도 많이 잡아주세요\nE: 길드 자원을 모아야 합니당\nB: 넵 임모탄님\nB: 충성충성충성 XXXXXXXXXXX\nE: 가즈아 발할라!\nB: 활동 포인트 얻는 법\nB: 1. 길드 > 연구 > 길드 연구 기부(추천 연구)\nB: 2. 길드 본부 > 도움\nB: 활동 포인트 사용법\nB: 1. 길드 > 길드 상점 > 상품 구매(기사 경험치 꿀)\nB: 기사렙업빨리해서 괴쎈 몬스터 빨리잡고 고성장합세다~!!\nB: 도움 고맙습니다 헤헤\nB: 곧 식사시간일텐데요\nB: 전초기지에 기사단 1개 넣어주시구\nB: 나머지 부대는 2레벨 이상 채집지에 넣어두신 후 식사다녀오시면\nB: 이벤트도 클리어되고 자원도 빵빵해집니다~~~\nB: 안녕하세요~~~~\nB: 궁금한거 물어보셔요~~~ 다 알려드립니다 ~~\nD: 우리 전초기지 3분 남았는데.\nA: 정찰은 어떻게 보내나요\nD: 좀만 더 영차영차 해주시면\nD: 남의 성 찍고 정찰 버튼 클릭하니까 정찰 보내졌어요.","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}
+{"generated":"* 여러 사람이 함께 게임을 하고 있다. \n* A가 이스라엘에 와서 이벤트에 참여하라는 부탁을 받고, 그 후 다른 플레이어들과 대화를 나눈다.\n* C는 게임 내 마을의 위치 정보를 공유하고, 마을을 방문하라고 다른 플레이어 D에게 권유한다. \n* E는 1시에 파수꾼 사냥을 한다고 알리고 좌표를 공유한다. \n* 새로운 플레이어 B와 F가 가입하면 C와 E는 환영한다.\n* C는 플레이어 D가 다른 지역으로 이동했는데, 이에 대해 농담을 나눈다.\n* 마지막으로 C는 남의커비(혹은 커비?)라는 새로운 폴리곤의 위치를 공개하고 신비로운 탐사를 준비하자고 제안한다.","instruction":"주어진 채팅에 대한 요약을 작성해줘. 사람들의 발화를 통해서 어떤 상황인지 파악할 수 있었으면 좋겠어. 예시처럼 \"요약:\" 뒤에다 작성해주면 돼.","source":"D: 보호막있는 성 빼고\nD: 다른 사람이 채집하는 곳도 정찰 되던데요.\nC: 정찰은 마을이나 타인의 성을 누르면 정찰 되는데요\nE: 정찰은 영지에 정찰 본부를 건설해야합니다\nC: ShareCoord#2레벨 프로잔 마을#148500.000000#524397.000000\nC: 미션떄때문이시면 이 마을이나\nA: 오 파랑새 날아간다\nC: 오 되셨나요\nA: ㅎㅎ 네 ㄳㄳ\nD: 파랑새...\nC: 사실부엉이입니다 ㅋㅋ\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nA: 딱다구리라고 해도 믿을 주둥이 길이지만 그렇다고 하니 믿겠습니다\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nA: 판게아. 그 신비의 세계\nC: 입이 삐죽 나왔어요\nC: 삐진듯요\nA: 다신 정찰 안보낼게 ㄷ\nE: ㅎㅎㅎㅎㅎㅎㅎㅎ\nC: ㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ\nD: 끝\nC: 군주님 계신가요\nC: 스트릿 군주님\nC: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nC: 요새 근처로 오시면 글뤁루톤 공격대 집결이 훨씬 용이합니다!!\nC: 이쪽으로 오셔용~!\nC: 좌표 근처 땅을 꾸욱 누르신 후 성 이동을 누르시면 성이 이전됩니다\nE: 1시에 파수꾼 사냥 갑니다.\nE: 태그 달았으니 위치 확인해주세요 :)\nB: 안녕하세요 가입인사 드립니다\nE: 안녕하세요! 반갑습니당 :)\nC: 안녕하세요~~~\nC: 1시 파수꾼 확인했습니다\nE: 예압\nF: 안녕하세요 반갑습니다\nC: ShareCoord#[CITA]길드요새#149559.000000#521478.000000\nE: 어서오십쇼!\nE: 젖과 꿀이 흐르는 시타델입니다.\nE: 길드 요새 근처로 성 이동 해주세요! :)\nD: 네~~~~~~\nC: ShareCoord#[STOP]전초기지#-152190.000000#492921.000000\nC: 우리도 요기처럼 싸우게 될지도 몰라요 ㅎㅎ\nC: 무력을 빨리 키워둡시다~!!\nC: 헐!\nC: 커비님 혹시 시타델 버리고 가시나요\nC: ShareCoord#[CITA]남의커비#-155181.000000#523710.000000\nC: 들켰으니 복귀하시죵\nC: ㅎㅎㅎㅎㅎㅎㅎ\nC: 탐사 준비하세요\nC: ShareCoord#[INSD]불안이#-148347.000000#488490.000000","task":"tactan_chat","model_id":"50turn_gemma_best","how_prompted":""}

streamlit_app_local/user_submit/llm/240413_dora.jsonl ADDED Viewed

	@@ -0,0 +1,6 @@

+{"model_id": "240413_dora", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "난 여름보다는 겨울이 더 좋아."}
+{"model_id": "240413_dora", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "근데 난 여름보다 겨울이 더 좋아. 왜냐하면 살짝 추운 날씨가 옷 입기도 쉽고, 푹 쉬고 있을 수도 있어서 좋거든."}
+{"model_id": "240413_dora", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다 겨울이 좋아. 왜냐하면 겨울에는 밖에 나가서 놀기도 편하고, 눈 덮인 산이나 호수를 볼 수 있어서 좋기 때문이야."}
+{"model_id": "240413_dora", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘은 학교라는 곳에 방문했다. 그리고 집에 돌아와서는 바로 숙제 문제집을 풀기 시작했는데, 이게 지금까지 했던 것 중에서 가장 집중이 잘 됐다. 지금은 숙제도 마치고 슬슬 잠자리에 드는 중이다."}
+{"model_id": "240413_dora", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘은 하루 종일 학교에 갔다 왔다. 집에 와서 잠깐 앉아 휴식을 취하고, 숙제도 마저 다 끝냈다. 오늘은 책도 많이 읽고, 마음도 편하게 쉬어야겠다."}
+{"model_id": "240413_dora", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘은 학교에 갔다. 집에 와서 쉬는 시간에 공부를 하며 하루를 마무리 지었다."}

streamlit_app_local/user_submit/llm/240413_dora_epoch_5.jsonl ADDED Viewed

	@@ -0,0 +1,6 @@

+{"model_id": "240413_dora_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울이 더 마음에 들어."}
+{"model_id": "240413_dora_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 아무래도 여름보다는 겨울이 더 좋다. 이유는 물론 추워서이기도 하지만, 내가 피부로 느끼는 것들 때문이다. 특히나 눈은 정말 하얗고 깨끗하다. 이것은 겨울에만 볼 수 있는 가장 아름다운 풍경 중 하나다."}
+{"model_id": "240413_dora_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울이 더 좋아."}
+{"model_id": "240413_dora_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘 나는 학교에 방문했다. 집에 돌아와서 열심히 공부했는데 내일부터는 또 다른 숙제들이 기다리고 있을 것이다."}
+{"model_id": "240413_dora_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "아 오늘은 정말 바쁜 하루였다. 학교에 가서 열심히 공부했다. 그리고 집에 돌아와서는 숙제도 다 끝내놓고 자기로 했다."}
+{"model_id": "240413_dora_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘 나는 학교에 갔었다. 집에 와서 공부를 했다. 그리고 오늘 숙제도 다 마치고, 이제 잘 준비하려고 한다~"}

streamlit_app_local/user_submit/llm/240413_full.jsonl ADDED Viewed

	@@ -0,0 +1,6 @@

+{"model_id": "240413_full", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울이 훨씬 더 좋은 것 같아."}
+{"model_id": "240413_full", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울을 훨씬 더 선호해. 여름의 높은 기온은 나를 지치게 만들거든. 시원한 바람과 함께 따뜻한 이불 속에서 푹 쉬다가 다시 새로운 마음으로 다시 시작하고 싶어."}
+{"model_id": "240413_full", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울이 더 좋다."}
+{"model_id": "240413_full", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘의 일상 기록:\n오늘 나는 학교에 갔다. 집에 돌아와서 공부를 했다. 오늘의 숙제도 끝마치고 이제 곧 잘 준비할 시간이다."}
+{"model_id": "240413_full", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘은 정말 하루 종일 바빴다. 학교를 갔고, 집에 돌아와서 바로 공부를 시작했다. 오늘 할 숙제를 모두 마쳤고, 이제는 잠을 자기 위해 자리에 누워야겠다."}
+{"model_id": "240413_full", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘 나는 학교에 갔었다. 집에 와서 공부를 시작했다. 오늘 과제도 다 끝마치고 이제 잠자리에 들 시간이다."}

streamlit_app_local/user_submit/llm/240413_full_epoch_5.jsonl ADDED Viewed

	@@ -0,0 +1,6 @@

+{"model_id": "240413_full_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울이 훨씬 더 좋은 것 같아."}
+{"model_id": "240413_full_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보다는 겨울이 훨씬 더 좋다. 겨울에는 추운 날씨 덕분에 산책이나 운동을 하기에도 편하고, 겨울의 정취를 느낄 수 있어서 좋다. 그리고 겨울에는 눈이 내릴 때의 분위기가 정말 낭만적이다."}
+{"model_id": "240413_full_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "난 여름보다는 겨울이 좋아.", "generated": "나는 여름보단 겨울이 더 좋더라. 겨울에는 차가운 바람이 얼굴을 톡톡 두드리고, 눈이 내리면 세상이 하얗고 아름답게 보여."}
+{"model_id": "240413_full_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 내용을 유지한 채, 어휘를 더 길게 바꾸거나 사소한 내용을 추가하여 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘 나는 학교에 방문했다. 집에 돌아와 열심히 공부했다. 오늘의 과제도 모두 끝마치고, 이제는 잠에 들 시간이다."}
+{"model_id": "240413_full_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 중간 중간에 새로운 내용이나 문장을 추가하여 길게 늘려서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘은 학교에 가는 날이었다. 집에서 아침을 먹고 학교에 도착했다. 수업이 시작되었고, 나는 열공 모드로 들어갔다. 숙제도 다 끝냈고 이제 자야겠다."}
+{"model_id": "240413_full_epoch_5", "task": "늘려쓰기", "instruction": "입력으로 주어진 글의 어투를 바꾸거나, 문장을 여러개로 나누어서 다시 써주세요.", "source": "오늘의 일기:\n오늘 나는 학교에 갔다. 집에 와서 공부를 했다. 오늘 숙제도 끝내고 자야겠다.", "generated": "오늘 나는 학교에 갔다 왔다. 집에 와서 숙제를 했다. 오늘 숙제를 끝내면 자야겠다."}