Spaces:
Sleeping
Sleeping
# English to Hindi Text Translation using Transformers | |
This project showcases a simple text translation model that translates English text to Hindi using the Hugging Face Transformers library. The model utilizes pre-trained sequence-to-sequence architecture for accurate and efficient translation. | |
## Table of Contents | |
- [Project Overview](#project-overview) | |
- [Installation](#installation) | |
- [Usage](#usage) | |
- [Model Training and Dataset](#model-training-and-dataset) | |
- [Model Testing and Deployment](#model-testing-and-deployment) | |
- [User Interface](#user-interface) | |
- [Challenges Faced](#challenges-faced) | |
- [Contributions](#contributions) | |
## Project Overview | |
Text translation is an essential task in natural language processing, and this project aims to provide a practical example of building and deploying a translation model. The project covers the following aspects: | |
- Data preprocessing: Tokenization and dataset preparation. | |
- Model training: Training a sequence-to-sequence model for English-to-Hindi translation. | |
- Model testing: Translating text using the trained model. | |
- User interface: Creating a user-friendly interface for text translation. | |
## Installation | |
To run this project, you'll need the following dependencies: | |
- Python 3.x | |
- TensorFlow | |
- Hugging Face Transformers | |
- Datasets library | |
- Gradio | |
You can install the required libraries using the following shell command: | |
```shell | |
pip install datasets transformers[sentencepiece] tensorflow gradio -q | |
``` | |
## Usage | |
Checkout the app [here](https://huggingface.co/spaces/Lohith9923/En-Hi-Translation) where you need to give english sentences or text in input textbox and output is translated text or sentence in Hindi. You can see some examples down for checking. | |
## Model Training and Dataset | |
For training the text translation model. | |
You can checkout the pre-trained model from [here](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2FHelsinki-NLP%2Fopus-mt-en-hi) and Dataset from [here](https://huggingface.co/datasets/cfilt/iitb-english-hindi/viewer/cfilt--iitb-english-hindi). | |
- First Download the pre-trained model using **transformers** library in python. | |
- Load the Dataset **cfilt/iitb-english-hindi** using **Datasets** library in python. | |
- Initialized the model, tokenizer, and preprocessing function. | |
- Tokenized the dataset and prepared the training and validation data. | |
- Compiled the model with the optimizer(**Adam**) with required parameters. | |
- Trained the model for the desired number of epochs. | |
## Model Testing and Deployment | |
To test the trained model and deploy a user interface: | |
- Saved the trained model at a preferred location. | |
- Loaded the model from the location and tokenizer for testing. | |
- Translated sample input text using the model. | |
- Deployed a Gradio interface for user-friendly translation. | |
## User Interface | |
The Gradio interface provides an interactive way to translate English text to Hindi. To use the interface: | |
- Run the project and navigate to the specified URL. | |
- Enter English text in the input box. | |
- Checkout the translated Hindi text in the output box. | |
## Challenges Faced | |
- Surfed through lot of resources in google and other platforms for best dataset for my project. | |
- Spent a lot of time gathering the correct resources for understanding about transformers, LLM's and gradio. | |
## Contributions | |
Contributions to this project are welcome! Here are some ways you can contribute: | |
- Improve the model's translation quality and performance. | |
- Enhance the user interface for a better user experience. | |
- Add support for more languages and translation directions. | |
To contribute, follow these steps: | |
- Fork this repository. | |
- Create a new branch for your feature or bug fix. | |
- Commit your changes and push them to your fork. | |
- Open a pull request with a detailed description of your changes. | |