metadata

license: apache-2.0
datasets:
  - squarelike/sharegpt_deepl_ko_translation
language:
  - en
  - ko
pipeline_tag: translation

Gugugo-koen-7B-V1.1

Detail repo: https://github.com/jwj7140/Gugugo

Base Model: Llama-2-ko-7b

Training Dataset: sharegpt_deepl_ko_translation.

I trained with 1x A6000 GPUs for 90 hours.

Prompt Template

KO->EN

### 한국어: {sentence}</끝>
### 영어:

EN->KO

### 영어: {sentence}</끝>
### 한국어:

Implementation Code

from vllm import LLM, SamplingParams

def make_prompt(data):
    prompts = []
    for line in data:
        prompts.append(f"### 영어: {line}</끝>\n### 한국어:")
    return prompts

texts = [
  "Hello world!",
  "Nice to meet you!"
]

prompts = make_prompt(texts)

sampling_params = SamplingParams(temperature=0.01, stop=["</끝>"], max_tokens=700)

llm = LLM(model="squarelike/Gugugo-koen-7B-V1.1-AWQ", quantization="awq", dtype="half")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    print(output.outputs[0].text)