dataset format for translation
#100
by
andrejaystevenson
- opened
for use this datatset (https://huggingface.co/datasets/tep_en_fa_para) for fine tune Mistral-7B how change dataset cell code?
Dataset cell code:
from datasets import load_dataset
dataset = load_dataset("tep_en_fa_para", split = "train")
EOS_TOKEN = tokenizer.eos_token
def formatting_func(example):
return example["text"] + EOS_TOKEN
andrejaystevenson
changed discussion status to
closed