thefrigidliquidation
/

nllb-jaen-1.3B-lightnovels

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

nllb-jaen-1.3B-lightnovels / README.md

thefrigidliquidation's picture

thefrigidliquidation

Fix readme

5034acf almost 2 years ago

|

history blame contribute delete

2.05 kB

	---
	language:
	- en
	- ja
	tags:
	- nllb
	license: cc-by-nc-4.0
	---

	# NLLB 1.3B fine-tuned on Japanese to English Light Novel translation

	This model was fine-tuned on light and web novel for Japanese to English translation.

	It can translate sentences and paragraphs up to 512 tokens.


	## Usage
	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("thefrigidliquidation/nllb-jaen-1.3B-lightnovels")
	model = AutoModelForSeq2SeqLM.from_pretrained("thefrigidliquidation/nllb-jaen-1.3B-lightnovels")

	generated_tokens = model.generate(
	**inputs,
	forced_bos_token_id=tokenizer.lang_code_to_id[tokenizer.tgt_lang],
	max_new_tokens=1024,
	no_repeat_ngram_size=6,
	).cpu()

	translated_text = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
	```

	Generating with diverse beam search seems to work best. Add the following to `model.generate`:
	```python
	num_beams=8,
	num_beam_groups=4,
	do_sample=False,
	```


	## Glossary
	You can provide up to 10 custom translations for nouns and character names at runtime. To do so, surround the Japanese term with term tokens. Prefix the word with one of `<t0>, <t1>, ..., <t9>` and suffix the word with `</t>`. The term will be translated as the prefix term token which can then be string replaced.

	For example, in `マイン、ルッツが迎えに来たよ` if you wish to have `マイン` translated as `Myne` you would replace `マイン` with `<t0>マイン</t>`. The model will translate `<t0>マイン</t>、ルッツが迎えに来たよ` as `<t0>, Lutz is here to pick you up.` Then simply do a string replacement on the output, replacing `<t0>` with `Myne`.


	## Honorifics
	You can force the model to generate or ignore honorifics.

	```python
	# default, the model decides whether to use honorifics
	tokenizer.tgt_lang = "jpn_Jpan"
	# no honorifics, the model is discouraged from using honorifics
	tokenizer.tgt_lang = "zsm_Latn"
	# honorifics, the model is encouraged to use honorifics
	tokenizer.tgt_lang = "zul_Latn"
	```