Spaces:
Runtime error
Runtime error
File size: 3,731 Bytes
35e6f3b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
# English to Hindi Text Translation using Transformers
This project showcases a simple text translation model that translates English text to Hindi using the Hugging Face Transformers library. The model utilizes pre-trained sequence-to-sequence architecture for accurate and efficient translation.
## Table of Contents
- [Project Overview](#project-overview)
- [Installation](#installation)
- [Usage](#usage)
- [Model Training and Dataset](#model-training-and-dataset)
- [Model Testing and Deployment](#model-testing-and-deployment)
- [User Interface](#user-interface)
- [Challenges Faced](#challenges-faced)
- [Contributions](#contributions)
## Project Overview
Text translation is an essential task in natural language processing, and this project aims to provide a practical example of building and deploying a translation model. The project covers the following aspects:
- Data preprocessing: Tokenization and dataset preparation.
- Model training: Training a sequence-to-sequence model for English-to-Hindi translation.
- Model testing: Translating text using the trained model.
- User interface: Creating a user-friendly interface for text translation.
## Installation
To run this project, you'll need the following dependencies:
- Python 3.x
- TensorFlow
- Hugging Face Transformers
- Datasets library
- Gradio
You can install the required libraries using the following shell command:
```shell
pip install datasets transformers[sentencepiece] tensorflow gradio -q
```
## Usage
Download the folder from here and the run the following command
```shell
python3 app.py
```
After running this command
## Model Training and Dataset
For training the text translation model.
You can checkout the pre-trained model from [here](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fhuggingface.co%2FHelsinki-NLP%2Fopus-mt-en-hi) and Dataset from [here](https://huggingface.co/datasets/cfilt/iitb-english-hindi/viewer/cfilt--iitb-english-hindi).
- First Download the pre-trained model using **transformers** library in python.
- Load the Dataset **cfilt/iitb-english-hindi** using **Datasets** library in python.
- Initialized the model, tokenizer, and preprocessing function.
- Tokenized the dataset and prepared the training and validation data.
- Compiled the model with the optimizer(**Adam**) with required parameters.
- Trained the model for the desired number of epochs.
## Model Testing and Deployment
To test the trained model and deploy a user interface:
- Saved the trained model at a preferred location.
- Loaded the model from the location and tokenizer for testing.
- Translated sample input text using the model.
- Deployed a Gradio interface for user-friendly translation.
## User Interface
The Gradio interface provides an interactive way to translate English text to Hindi. To use the interface:
- Run the project and navigate to the specified URL.
- Enter English text in the input box.
- Checkout the translated Hindi text in the output box.
## Challenges Faced
- Surfed through lot of resources in google and other platforms for best dataset for my project.
- Spent a lot of time gathering the correct resources for understanding about transformers, LLM's and gradio.
## Contributions
Contributions to this project are welcome! Here are some ways you can contribute:
- Improve the model's translation quality and performance.
- Enhance the user interface for a better user experience.
- Add support for more languages and translation directions.
To contribute, follow these steps:
- Fork this repository.
- Create a new branch for your feature or bug fix.
- Commit your changes and push them to your fork.
- Open a pull request with a detailed description of your changes.
|