Improved Code-Mixed Sentence Translation Using Decoder-Only Transformers

Overview

This project addresses the limitations of traditional Neural Machine Translation (NMT) models in translating code-mixed sentences by utilizing a decoder-only transformer model. Inspired by the training methodologies of models like GPT and Llama, this approach leverages self-supervised learning to understand the context of languages more deeply. After learning the context, the model is fine-tuned on a smaller translation dataset, making it effective for translating both regular and code-mixed sentences.

Benefits

  1. Fraction of Translation Dataset: The model requires only a small amount of translation data for fine-tuning, which reduces the data preparation overhead.
  2. Rich and Meaningful Translation: By understanding the underlying context of languages, the model provides more accurate and meaningful translations for both regular and code-mixed sentences.
  3. Multilingual Capability: A single model can potentially translate multiple languages, making it a versatile solution for diverse translation needs.

Approach

  1. Context Learning: Train a decoder-only transformer model on a large corpus of text using self-supervised learning. This stage allows the model to grasp the contextual nuances of different languages.
  2. Fine-Tuning: Fine-tune the pre-trained model on a smaller dataset specifically for translation tasks. This step adapts the model to effectively handle translation while retaining its contextual understanding.

Example

Here is a comparison between the traditional Google Translate and the proposed approach:

  • Text: “Sun ka diameter kya hoga?”

  • Google Translate: “what will happen to sun's demetre”

image/png

  • Proposed Approach: “What is the diameter of the Sun?”

The proposed method outperforms traditional translation models by providing a more accurate translation that respects the context and meaning of the original sentence.

Usage

  1. Pre-training: Train the decoder-only transformer model on a large text corpus.
  2. Fine-tuning: Fine-tune the model on a smaller dataset of translated sentences.
  3. Translation: Use the fine-tuned model to translate both regular and code-mixed sentences.

Future Work

  • Evaluation: Conduct thorough evaluations and comparisons with other state-of-the-art translation models.
  • Expansion: Explore additional languages and code-mixed scenarios to enhance the model's versatility.

License

This project is licensed under the MIT License.


Feel free to adjust any sections as needed!

Downloads last month
11
GGUF
Model size
7.25B params
Architecture
llama

4-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.