Trying to translate japanese presentations

#1
by SMHLondon - opened

I use python-pptx to get access to the text in all the individual shapes. And then push this into the example's prompt
message = [
# {"role": "user", "content": "Translate this English sentence into Japanese.\n" + input("En > ")},
{"role": "user", "content": f"Translate this Japanese sentence into English.\n {text}"},
]
prompts = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
Is there a better way to ensure that the line break structure from the input is applied to the translation too? At the moment the translation does sometimes for tiny text passages repeat the text multiple times or just puts out a continuous set of words.
I use the following in regards to the tokenizer:
tokenizer = AutoTokenizer.from_pretrained(model_id, token = access_tokenw)
tokenizer.pad_token = tokenizer.eos_token

Any idea welcome!

Sign up or log in to comment