--- license: llama2 --- ### Description This is a translation model utilizing the high Japanese proficiency of Swallow-hf-13b, primarily focused on English-Japanese or any language-to-Japanese translation. The model, tokyotech-llm/Swallow-13b-hf, has been fine-tuned with an 4K context and is mainly aimed at translating relatively long texts ranging from 100 tokens to 1-2 thousand tokens. While its core strength lies in English-Japanese translation, it also partially supports translation in other languages. (Multilingual translation features and long context translation become unstable when quantized.) ### Prompt An XML-like instruction template has been adopted. --- WMT23(EN->JA) | Model | BLEU | |--------------------------------------------------|------| | GPT4-turbo | 22.4 | | Command R+ | 22.2 | | Claude 3 Sonnet | 20.9 | | aixsatoshi-Honyaku-13b-Q6_K.gguf | 20.8 | | aixsatoshi-Honyaku-13b-Q8_0.gguf | 20.7 | | aixsatoshi-Honyaku-13b-IQ4_NL.gguf | 20.6 | | aixsatoshi-Honyaku-13b-IQ4_XS.gguf | 20.6 | | aixsatoshi-Honyaku-13b-Q4_0.gguf | 20.4 | | aixsatoshi-Honyaku-13b-IQ3_M.gguf | 19.8 | | Command R | 18.4 | | fugumt-en-ja(bs:5) | 18.0 | | C3TR-Adapter.Q5_1.gguf | 16.0 | | Mistral-Large | 11.3 | 引用 @aorblue様測定[link](https://x.com/aorblue/status/1792951460088685047) --- ### 概要 Swallow-hf-13bの高い日本語力を利用した翻訳モデルです [tokyotech-llm/Swallow-hf-13b](https://huggingface.co/tokyotech-llm/Swallow-13b-hf) 英日翻訳メインに、ファインチューニングしています 1-2K tokenまでの翻訳に対応しています 英語以外の言語から日本語への翻訳も一部対応しています ### プロンプト XML likeなタグによるinstructionフォーマットを採用しました ## Usage ### Prompt format:English to Japanese (main function) ``` : sentences  :   ``` ### Prompt format:Other language to Japanese (experimental) ``` : sentences  :   ``` ### Prompt format:Japanese to English ``` not supported ``` 長文の場合、Textstreamerの使用をお勧めします ``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer model_name = "aixsatoshi/Honyaku-13b" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Define the streamer streamer = TextStreamer(tokenizer) # Define the English prompt english_prompt = """ In an era marked by rapid globalization, the intricate interplay between international law, economic policies, and political dynamics has become increasingly complex. Legal frameworks, once confined within national borders, now stretch across continents, necessitating a nuanced understanding of transnational legislation and treaties. As multinational corporations navigate the labyrinthine maze of global markets, economic theories that underpin currency fluctuations, trade imbalances, and fiscal policies are more pertinent than ever. Central to these economic considerations is the concept of market equilibrium, a delicate balance affected by myriad factors including consumer behavior, governmental regulations, and global crises. Politically, the landscape is equally labyrinthine. Ideological shifts and the resurgence of nationalism have reshaped diplomatic relations, with international agreements and alliances being tested under the strain of geopolitical tensions. The role of supranational entities like the United Nations and the European Union in mediating these conflicts is of paramount importance, as is the need for diplomatic finesse in an increasingly multipolar world. Furthermore, the intersection of politics and economics is evident in the debate over economic sanctions and their efficacy in swaying political decisions. In this context, understanding the subtleties of rhetoric used in political discourse, and how it interweaves with legal jargon and economic terminology, is crucial. For instance, the rhetoric surrounding fiscal austerity measures often intertwines with legal discourse on budgetary legislation and economic debates on inflation control. Similarly, discussions on constitutional amendments are frequently laden with political undertones, reflecting broader societal issues and ideological divides. This convergence of legal, economic, and political vernacular presents a unique challenge for machine translation systems, demanding not only linguistic accuracy but also a deep comprehension of the nuanced interplay of these disciplines. """ # Prepare the prompt for English to Japanese translation prompt = f": {english_prompt} \n\n:" # Tokenize the input text and move to CUDA device inputs = tokenizer(prompt, return_tensors="pt").to("cuda") # Generate the output using the model and streamer output = model.generate(**inputs, max_new_tokens=4096, do_sample=True, top_k=20, top_p=0.95, streamer=streamer) ``` mmngaさん作成のgguf版(prompt 973 tokens) [mmnga/aixsatoshi-Honyaku-13b-gguf](https://huggingface.co/mmnga/aixsatoshi-Honyaku-13b-gguf) aixsatoshi-Honyaku-13b-Q8-0.gguf 出力例 Output 1105tokens Total 2076 tokens ``` :1. In an era marked by rapid globalization, the intricate interplay between international law, economic policies, and political dynamics has become increasingly complex. Legal frameworks, once confined within national borders, now stretch across continents, necessitating a nuanced understanding of transnational legislation and treaties. As multinational corporations navigate the labyrinthine maze of global markets, economic theories that underpin currency fluctuations, trade imbalances, and fiscal policies are more pertinent than ever. Central to these economic considerations is the concept of market equilibrium, a delicate balance affected by myriad factors including consumer behavior, governmental regulations, and global crises. 2. Politically, the landscape is equally labyrinthine. Ideological shifts and the resurgence of nationalism have reshaped diplomatic relations, with international agreements and alliances being tested under the strain of geopolitical tensions. The role of supranational entities like the United Nations and the European Union in mediating these conflicts is of paramount importance, as is the need for diplomatic finesse in an increasingly multipolar world. Furthermore, the intersection of politics and economics is evident in the debate over economic sanctions and their efficacy in swaying political decisions. 3. In this context, understanding the subtleties of rhetoric used in political discourse, and how it interweaves with legal jargon and economic terminology, is crucial. For instance, the rhetoric surrounding fiscal austerity measures often intertwines with legal discourse on budgetary legislation and economic debates on inflation control. Similarly, discussions on constitutional amendments are frequently laden with political undertones, reflecting broader societal issues and ideological divides. 4. This convergence of legal, economic, and political vernacular presents a unique challenge for machine translation systems, demanding not only linguistic accuracy but also a deep comprehension of the nuanced interplay of these disciplines. To achieve high-quality translations, it is essential to consider the specific context in which terms are used, recognizing the potential for multiple interpretations based on subtle differences in phrasing. This necessitates the incorporation of advanced natural language processing techniques capable of parsing complex sentence structures and discerning the intended meaning behind each term. 5. Moreover, the impact of cultural differences on language use cannot be overstated. Expressions and idioms that are commonplace in one culture may be completely unfamiliar in another, leading to potential misunderstandings if not properly accounted for in translation. Effective machine translation must therefore incorporate cultural awareness, adapting translations to maintain the original intent and tone while ensuring they resonate appropriately with the target audience. 6. The integration of domain-specific knowledge is also vital for enhancing translation quality. Legal documents, economic reports, and political speeches each have their own specialized lexicons and stylistic conventions. By training machine translation systems on corpora specific to these fields, it becomes possible to achieve greater precision and accuracy. This specialized training allows the system to recognize and correctly translate technical terms, idiomatic expressions, and nuanced language use that might otherwise be lost or mistranslated. 7. Furthermore, ongoing advancements in artificial intelligence and machine learning are poised to revolutionize the field of machine translation. By leveraging deep learning algorithms and large-scale neural networks, translation systems can continuously improve their performance, learning from vast amounts of multilingual data to refine their understanding and accuracy. These technological advancements hold the promise of breaking down language barriers and facilitating more effective communication across diverse fields and regions. 8. In conclusion, the challenge of achieving high-quality machine translation in the context of legal, economic, and political discourse is multifaceted. It requires not only linguistic proficiency but also a deep understanding of the intricate interplay between these disciplines. By incorporating advanced natural language processing techniques, cultural awareness, domain-specific knowledge, and ongoing technological advancements, it is possible to create translation systems that meet the demands of an increasingly interconnected and complex global landscape. As we continue to push the boundaries of what is possible in machine translation, we move closer to a future where language is no longer a barrier but a bridge to greater understanding and collaboration. :1. グローバル化が急速に進む時代、国際法、経済政策、政治力学の複雑な相互作用が、かつてないほどに複雑になっています。国際法の枠組みは、国家の境界内に限定されていましたが、現在では大陸を越えて広がり、多国間条約や国際法の網の目を理解するには、より複雑でグローバルな視点が必要となりました。多国籍企業は、複雑怪奇なグローバル市場を縦横無尽に駆け巡っていますが、その際、通貨の変動、貿易の不均衡、財政政策の基礎となる経済理論をより理解することが重要になっています。これらの経済的考慮事項において重要なのは、市場の均衡を保つという概念です。この均衡は、消費者行動、政府規制、世界的な危機など、さまざまな要因によって微妙に影響を受けています。 2. 政治の世界も複雑怪奇です。イデオロギーのシフトと国家主義の復活により、外交関係は地政学的緊張によって試練に立たされています。国連や欧州連合といった超国家的な機関が、この紛争を調停することが不可欠です。また、多極化する世界において、外交官が機微をわきまえた外交術を発揮することがますます重要になっています。経済制裁の有効性が政治決定をどう左右するかという議論でも、政治と経済が交差しています。 3. こうした状況の中、法的、経済的、政治的な言論の微妙なニュアンスを理解することが重要です。例えば、財政緊縮措置の言説は、財政立法や経済のインフレーション・コントロールに関する法律用語と交錯し、政治的な意図を反映することがあります。また、憲法修正に関する議論には、しばしば政治的な背景が潜み、それはより大きな社会問題やイデオロギーの分断を反映しています。 4. このように、法的、経済的、政治的な言葉遣いが複雑に絡み合い、正確さだけでなく、これらの学問分野の微妙な相互作用を理解することが求められます。例えば、財政緊縮措置に関する言説は、財政立法や経済のインフレーション・コントロールに関する法律用語と重なることがあります。同様に、憲法修正に関する議論は、政治的な意図を反映し、社会問題やイデオロギーの分断を反映することがあります。 5. さらに、文化的な違いが言葉遣いに与える影響は無視できません。1つの文化で一般的な言い回しや表現が、他の文化では全く知られていない場合があります。これは、翻訳で意図せずに誤解を招くことになりかねません。適切に翻訳を行うには、文化的な意識が不可欠であり、原文の意図とトーンを維持しながら、対象読者に適切に訴求するような翻訳を行う必要があります。 6. さらに、ドメイン固有の知識の統合は、翻訳品質の向上にもつながります。法律文書、経済報告書、政治演説書などには、それぞれ独自の専門用語やレトリックがあります。これらの分野に特化したコーパスで翻訳システムを訓練することで、正確さと精度が向上します。これにより、専門用語、慣用句、微妙な言葉遣いを正しく翻訳できるようになります。 7. また、人工知能や機械学習の技術進歩は、機械翻訳に変革をもたらす可能性があります。深層学習アルゴリズムや大規模なニューラルネットワークを活用することで、機械翻訳システムは性能を向上させ、膨大なマルチリンガルデータを学習することで理解と精度を高めることができます。これらの技術的進歩は、言語の壁を取り壊し、多様な分野や地域でより効果的なコミュニケーションを可能にする未来への道を切り開いています。 8. 結論として、法、経済、政治の分野における高品質な機械翻訳の実現は、多面的な課題です。それには、言語的能力だけでなく、これらの学問分野の複雑な相互作用への深い理解が必要です。先進的な自然言語処理技術や文化的意識、分野特化型の知識、技術的進歩の継続的な活用により、私たちは言語が障壁ではなく、より深い理解と協力を実現する架け橋となる、より複雑なグローバルな世界への道を歩み続けることができます。機械翻訳の限界を押し広げていく中で、私たちは未来に向けて、言語はもはや障壁ではなく、橋となる世界へと近づいています。 ```