leaderboard-pr-bot's picture
Adding Evaluation Results
b2a9904 verified
|
raw
history blame
8.37 kB
metadata
language:
  - ko
license: cc-by-nc-4.0
library_name: transformers
pipeline_tag: text-generation
model-index:
  - name: Synatra-V0.1-7B
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 55.29
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/Synatra-V0.1-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 76.63
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/Synatra-V0.1-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 55.29
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/Synatra-V0.1-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 55.76
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/Synatra-V0.1-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 72.77
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/Synatra-V0.1-7B
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 19.41
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/Synatra-V0.1-7B
          name: Open LLM Leaderboard

V0.3 IS UP

Link to V0.3

Synatra-V0.1-7B

Made by StableFluffy

Visit my website! - Currently on consturction..

License

This model is strictly non-commercial (cc-by-nc-4.0) use only which takes priority over the LLAMA 2 COMMUNITY LICENSE AGREEMENT. The "Model" is completely free (ie. base model, derivates, merges/mixes) to use for non-commercial purposes as long as the the included cc-by-nc-4.0 license in any parent repository, and the non-commercial use statute remains, regardless of other models' licences. The licence can be changed after new model released.

Model Details

Base Model
mistralai/Mistral-7B-Instruct-v0.1

Trained On
A6000 48GB * 8

Instruction format

ν•™μŠ΅ κ³Όμ •μ˜ μ‹€μˆ˜λ‘œ [/INST]κ°€ μ•„λ‹Œ [\INST]κ°€ μ μš©λ˜μ—ˆμŠ΅λ‹ˆλ‹€. v0.2 μ—μ„œ μˆ˜μ • 될 μ˜ˆμ •μž…λ‹ˆλ‹€.

In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [\INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. Plus, It is strongly recommended to add a space at the end of the prompt.

E.g.

text = "<s>[INST] μ•„μ΄μž‘ λ‰΄ν„΄μ˜ 업적을 μ•Œλ €μ€˜. [\INST] "

Model Benchmark

KULLM Evaluation

ꡬ름v2 repo μ—μ„œ μ œκ³΅λ˜λŠ” 데이터셋과 ν”„λ‘¬ν”„νŠΈλ₯Ό μ‚¬μš©ν•˜μ—¬ ν‰κ°€ν–ˆμŠ΅λ‹ˆλ‹€. λ‹Ήμ‹œ GPT4와 ν˜„μž¬ GPT4κ°€ μ™„μ „νžˆ λ™μΌν•˜μ§€λŠ” μ•ŠκΈ°μ— μ‹€μ œ 결과와 μ•½κ°„μ˜ 차이가 쑴재 ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

img

Model 이해가λŠ₯μ„± μžμ—°μŠ€λŸ¬μ›€ λ§₯λ½μœ μ§€ ν₯λ―Έλ‘œμ›€ μ§€μ‹œμ–΄μ‚¬μš© μ „λ°˜μ ν€„λ¦¬ν‹°
GPT-3.5 0.980 2.806 2.849 2.056 0.917 3.905
GPT-4 0.984 2.897 2.944 2.143 0.968 4.083
KoAlpaca v1.1 0.651 1.909 1.901 1.583 0.385 2.575
koVicuna 0.460 1.583 1.726 1.528 0.409 2.440
KULLM v2 0.742 2.083 2.107 1.794 0.548 3.036
Synatra-V0.1-7B 0.960 2.821 2.755 2.356 0.934 4.065

KOBEST_BOOLQ, SENTINEG, WIC - ZERO_SHOT

EleutherAI/lm-evaluation-harnessλ₯Ό μ‚¬μš©ν•˜μ—¬ BoolQ, SentiNeg, Wic을 μΈ‘μ •ν–ˆμŠ΅λ‹ˆλ‹€.

HellaSwag와 COPAλŠ” μ›λ³Έμ½”λ“œλ₯Ό μˆ˜μ •ν•˜λŠ” κ³Όμ •μ—μ„œ 어렀움을 κ²ͺμ–΄ 아직 μ§„ν–‰ν•˜μ§€ μ•Šμ•˜μŠ΅λ‹ˆλ‹€.

NOTE

BoolQμ—λŠ” Instruction λͺ¨λΈμ˜ 이해λ₯Ό λ•κΈ°μœ„ν•΄ "μœ„ 글에 λŒ€ν•œ μ§ˆλ¬Έμ— 사싀을 ν™•μΈν•˜λŠ” μž‘μ—…μž…λ‹ˆλ‹€.", "예, μ•„λ‹ˆμ˜€λ‘œ λŒ€λ‹΅ν•΄μ£Όμ„Έμš”."의 ν”„λ‘¬ν”„νŠΈλ₯Ό μΆ”κ°€ν–ˆμŠ΅λ‹ˆλ‹€. SentiNegμ—λŠ” Instruction λͺ¨λΈμ˜ 이해λ₯Ό λ•κΈ°μœ„ν•΄ "μœ„ λ¬Έμž₯의 긍정, λΆ€μ • μ—¬λΆ€λ₯Ό νŒλ‹¨ν•˜μ„Έμš”."의 ν”„λ‘¬ν”„νŠΈλ₯Ό μΆ”κ°€ν–ˆμŠ΅λ‹ˆλ‹€. Wic의 κ²½μš°λŠ” [INST], [\INST]만 μΆ”κ°€ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

Model COPA HellaSwag BoolQ SentiNeg Wic
EleutherAI/polyglot-ko-12.8b 0.7937 0.5954 0.4818 0.9117 0.3985
Synatra-V0.1-7B NaN NaN 0.849 0.8690 0.4881

Implementation Code

Since, chat_template already contains insturction format above. You can use the code below.

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("maywell/Synatra-V0.1-7B")
tokenizer = AutoTokenizer.from_pretrained("maywell/Synatra-V0.1-7B")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)
model.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

If you run it on oobabooga your prompt would look like this. - ** Need to add Space at the end! **

[INST] 링컨에 λŒ€ν•΄μ„œ μ•Œλ €μ€˜. [\INST] 

Readme format: beomi/llama-2-ko-7b


Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 53.54
ARC (25-shot) 55.29
HellaSwag (10-shot) 76.63
MMLU (5-shot) 55.29
TruthfulQA (0-shot) 55.76
Winogrande (5-shot) 72.77
GSM8K (5-shot) 19.41
DROP (3-shot) 39.63

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 55.86
AI2 Reasoning Challenge (25-Shot) 55.29
HellaSwag (10-Shot) 76.63
MMLU (5-Shot) 55.29
TruthfulQA (0-shot) 55.76
Winogrande (5-shot) 72.77
GSM8k (5-shot) 19.41