diff --git a/indicTrans/IndicTrans_training.ipynb b/IndicTrans_training.ipynb
similarity index 100%
rename from indicTrans/IndicTrans_training.ipynb
rename to IndicTrans_training.ipynb
diff --git a/indicTrans/LICENSE b/LICENSE
similarity index 100%
rename from indicTrans/LICENSE
rename to LICENSE
diff --git a/README.md b/README.md
index 340bd6f020cf9d20ab0202c946a8f7d0542b3f31..0dc4c28fb8fde490f3f76a6bc7509825f362bb0a 100644
--- a/README.md
+++ b/README.md
@@ -1,37 +1,296 @@
----
-title: indic translation
-emoji: 🏢
-colorFrom: gray
-colorTo: pink
-sdk: gradio
-app_file: app.py
-pinned: false
----
+
-# Configuration
+**IndicTrans** is a Transformer-4x ( ~434M ) multilingual NMT model trained on [Samanantar](https://indicnlp.ai4bharat.org/samanantar) dataset which is the largest publicly available parallel corpora collection for Indic languages at the time of writing ( 14 April 2021 ). It is a single script model i.e we convert all the Indic data to the Devanagari script which allows for ***better lexical sharing between languages for transfer learning, prevents fragmentation of the subword vocabulary between Indic languages and allows using a smaller subword vocabulary***. We currently release two models - Indic to English and English to Indic and support the following 11 indic languages:
-`title`: _string_
-Display title for the Space
+| | | | |
+| ------------- | -------------- | ------------ | ----------- |
+| Assamese (as) | Hindi (hi) | Marathi (mr) | Tamil (ta) |
+| Bengali (bn) | Kannada (kn) | Oriya (or) | Telugu (te) |
+| Gujarati (gu) | Malayalam (ml) | Punjabi (pa) |
-`emoji`: _string_
-Space emoji (emoji-only character allowed)
-`colorFrom`: _string_
-Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
-`colorTo`: _string_
-Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray)
-`sdk`: _string_
-Can be either `gradio` or `streamlit`
+- [Updates](#updates)
+- [Download IndicTrans models:](#download-indictrans-models)
+- [Using the model for translating any input](#using-the-model-for-translating-any-input)
+- [Finetuning the model on your input dataset](#finetuning-the-model-on-your-input-dataset)
+- [Mining Indic to Indic pairs from english centric corpus](#mining-indic-to-indic-pairs-from-english-centric-corpus)
+- [Installation](#installation)
+- [How to train the indictrans model on your training data?](#how-to-train-the-indictrans-model-on-your-training-data)
+- [Network & Training Details](#network--training-details)
+- [Folder Structure](#folder-structure)
+- [Citing](#citing)
+ - [License](#license)
+ - [Contributors](#contributors)
+ - [Contact](#contact)
-`sdk_version` : _string_
-Only applicable for `streamlit` SDK.
-See [doc](https://hf.co/docs/hub/spaces) for more info on supported versions.
-`app_file`: _string_
-Path to your main application file (which contains either `gradio` or `streamlit` Python code).
-Path is relative to the root of the repository.
+## Updates
+Click to expand
+18 December 2021
-`pinned`: _boolean_
-Whether the Space stays on top of your list.
+```
+Tutorials updated with latest model links
+```
+
+
+26 November 2021
+```
+ - v0.3 models are now available for download
+```
+
+27 June 2021
+```
+- Updated links for indic to indic model
+- Add more comments to training scripts
+- Add link to [Samanantar Video](https://youtu.be/QwYPOd1eBtQ?t=383)
+- Add folder structure in readme
+- Add python wrapper for model inference
+```
+
+09 June 2021
+```
+- Updated links for models
+- Added Indic to Indic model
+```
+
+09 May 2021
+```
+- Added fix for finetuning on datasets where some lang pairs are not present. Previously the script assumed the finetuning dataset will have data for all 11 indic lang pairs
+- Added colab notebook for finetuning instructions
+```
+
+
+## Download IndicTrans models:
+
+Indic to English: [v0.3](https://storage.googleapis.com/samanantar-public/V0.3/models/indic-en.zip)
+
+English to Indic: [v0.3](https://storage.googleapis.com/samanantar-public/V0.3/models/en-indic.zip)
+
+Indic to Indic: [v0.3](https://storage.googleapis.com/samanantar-public/V0.3/models/m2m.zip)
+
+
+
+## Using the model for translating any input
+
+The model is trained on single sentences and hence, users need to split parapgraphs to sentences before running the translation when using our command line interface (The python interface has `translate_paragraph` method to handle multi sentence translations).
+
+Note: IndicTrans is trained with a max sequence length of **200** tokens (subwords). If your sentence is too long (> 200 tokens), the sentence will be truncated to 200 tokens before translation.
+
+Here is an example snippet to split paragraphs into sentences for English and Indic languages supported by our model:
+```python
+# install these libraries
+# pip install mosestokenizer
+# pip install indic-nlp-library
+
+from mosestokenizer import *
+from indicnlp.tokenize import sentence_tokenize
+
+INDIC = ["as", "bn", "gu", "hi", "kn", "ml", "mr", "or", "pa", "ta", "te"]
+
+def split_sentences(paragraph, language):
+ if language == "en":
+ with MosesSentenceSplitter(language) as splitter:
+ return splitter([paragraph])
+ elif language in INDIC:
+ return sentence_tokenize.sentence_split(paragraph, lang=language)
+
+split_sentences("""COVID-19 is caused by infection with the severe acute respiratory
+syndrome coronavirus 2 (SARS-CoV-2) virus strain. The disease is mainly transmitted via the respiratory
+route when people inhale droplets and particles that infected people release as they breathe, talk, cough, sneeze, or sing. """, language='en')
+
+>> ['COVID-19 is caused by infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus strain.',
+ 'The disease is mainly transmitted via the respiratory route when people inhale droplets and particles that infected people release as they breathe, talk, cough, sneeze, or sing.']
+
+split_sentences("""இத்தொற்றுநோய் உலகளாவிய சமூக மற்றும் பொருளாதார சீர்குலைவை ஏற்படுத்தியுள்ளது.இதனால் பெரும் பொருளாதார மந்தநிலைக்குப் பின்னர் உலகளவில் மிகப்பெரிய மந்தநிலை ஏற்பட்டுள்ளது. இது விளையாட்டு,மத, அரசியல் மற்றும் கலாச்சார நிகழ்வுகளை ஒத்திவைக்க அல்லது ரத்து செய்ய வழிவகுத்தது.
+அச்சம் காரணமாக முகக்கவசம், கிருமிநாசினி உள்ளிட்ட பொருட்களை அதிக நபர்கள் வாங்கியதால் விநியோகப் பற்றாக்குறை ஏற்பட்டது.""",
+ language='ta')
+
+>> ['இத்தொற்றுநோய் உலகளாவிய சமூக மற்றும் பொருளாதார சீர்குலைவை ஏற்படுத்தியுள்ளது.',
+ 'இதனால் பெரும் பொருளாதார மந்தநிலைக்குப் பின்னர் உலகளவில் மிகப்பெரிய மந்தநிலை ஏற்பட்டுள்ளது.',
+ 'இது விளையாட்டு,மத, அரசியல் மற்றும் கலாச்சார நிகழ்வுகளை ஒத்திவைக்க அல்லது ரத்து செய்ய வழிவகுத்தது.',
+ 'அச்சம் காரணமாக முகக்கவசம், கிருமிநாசினி உள்ளிட்ட பொருட்களை அதிக நபர்கள் வாங்கியதால் விநியோகப் பற்றாக்குறை ஏற்பட்டது.']
+
+
+```
+
+Follow the colab notebook to setup the environment, download the trained _IndicTrans_ models and translating your own text.
+
+Command line interface --> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/indictrans_fairseq_inference.ipynb)
+
+
+Python interface --> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/indicTrans_python_interface.ipynb)
+
+ The python interface is useful in case you want to reuse the model for multiple translations and do not want to reinitialize the model each time
+
+
+## Finetuning the model on your input dataset
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/indicTrans_Finetuning.ipynb)
+
+The colab notebook can be used to setup the environment, download the trained _IndicTrans_ models and prepare your custom dataset for funetuning the indictrans model. There is also a section on mining indic to indic data from english centric corpus for finetuning indic to indic model.
+
+**Note**: Since this is a big model (400M params), you might not be able to train with reasonable batch sizes in the free google Colab account. We are planning to release smaller models (after pruning / distallation) soon.
+
+## Mining Indic to Indic pairs from english centric corpus
+
+The `extract_non_english_pairs` in `scripts/extract_non_english_pairs.py` can be used to mine indic to indic pairs from english centric corpus.
+
+As described in the [paper](https://arxiv.org/pdf/2104.05596.pdf) (section 2.5) , we use a very strict deduplication criterion to avoid the creation of very similar parallel sentences. For example, if an en sentence is aligned to *M* hi sentences and *N* ta sentences, then we would get *MN* hi-ta pairs. However, these pairs would be very similar and not contribute much to the training process. Hence, we retain only 1 randomly chosen pair out of these *MN* pairs.
+
+```bash
+extract_non_english_pairs(indir, outdir, LANGS):
+ """
+ Extracts non-english pair parallel corpora
+ indir: contains english centric data in the following form:
+ - directory named en-xx for language xx
+ - each directory contains a train.en and train.xx
+ outdir: output directory to store mined data for each pair.
+ One directory is created for each pair.
+ LANGS: list of languages in the corpus (other than English).
+ The language codes must correspond to the ones used in the
+ files and directories in indir. Prefarably, sort the languages
+ in this list in alphabetic order. outdir will contain data for xx-yy,
+ but not for yy-xx, so it will be convenient to have this list in sorted order.
+ """
+```
+
+## Installation
+Click to expand
+
+```bash
+cd indicTrans
+git clone https://github.com/anoopkunchukuttan/indic_nlp_library.git
+git clone https://github.com/anoopkunchukuttan/indic_nlp_resources.git
+git clone https://github.com/rsennrich/subword-nmt.git
+# install required libraries
+pip install sacremoses pandas mock sacrebleu tensorboardX pyarrow indic-nlp-library
+
+# Install fairseq from source
+git clone https://github.com/pytorch/fairseq.git
+cd fairseq
+pip install --editable ./
+
+```
+
+
+## How to train the indictrans model on your training data?
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/IndicTrans_training.ipynb)
+
+
+Follow the colab notebook to setup the environment, download the dataset and train the indicTrans model
+
+## Network & Training Details
+
+- Architechture: IndicTrans uses 6 encoder and decoder layers, input embeddings of size 1536 with 16 attention heads and
+feedforward dimension of 4096 with total number of parameters of 434M
+- Loss: Cross entropy loss
+- Optimizer: Adam
+- Label Smoothing: 0.1
+- Gradient clipping: 1.0
+- Learning rate: 5e-4
+- Warmup_steps: 4000
+
+Please refer to section 4, 5 of our [paper](https://arxiv.org/ftp/arxiv/papers/2104/2104.05596.pdf) for more details on training/experimental setup.
+
+## Folder Structure
+```
+
+IndicTrans
+│ .gitignore
+│ apply_bpe_traindevtest_notag.sh # apply bpe for joint vocab (Train, dev and test)
+│ apply_single_bpe_traindevtest_notag.sh # apply bpe for seperate vocab (Train, dev and test)
+│ binarize_training_exp.sh # binarize the training data after preprocessing for fairseq-training
+│ compute_bleu.sh # Compute blue scores with postprocessing after translating with `joint_translate.sh`
+│ indictrans_fairseq_inference.ipynb # colab example to show how to use model for inference
+│ indicTrans_Finetuning.ipynb # colab example to show how to use model for finetuning on custom domain data
+│ joint_translate.sh # used for inference (see colab inference notebook for more details on usage)
+│ learn_bpe.sh # learning joint bpe on preprocessed text
+│ learn_single_bpe.sh # learning seperate bpe on preprocessed text
+│ LICENSE
+│ prepare_data.sh # prepare data given an experiment dir (this does preprocessing,
+│ # building vocab, binarization ) for bilingual training
+│ prepare_data_joint_training.sh # prepare data given an experiment dir (this does preprocessing,
+│ # building vocab, binarization ) for joint training
+│ README.md
+│
+├───legacy # old unused scripts
+├───model_configs # custom model configrations are stored here
+│ custom_transformer.py # contains custom 4x transformer models
+│ __init__.py
+├───inference
+│ custom_interactive.py # for python wrapper around fairseq-interactive
+│ engine.py # python interface for model inference
+└───scripts # stores python scripts that are used by other bash scripts
+ │ add_joint_tags_translate.py # add lang tags to the processed training data for bilingual training
+ │ add_tags_translate.py # add lang tags to the processed training data for joint training
+ │ clean_vocab.py # clean vocabulary after building with subword_nmt
+ │ concat_joint_data.py # concatenates lang pair data and creates text files to keep track
+ │ # of number of lines in each lang pair.
+ │ extract_non_english_pairs.py # Mining Indic to Indic pairs from english centric corpus
+ │ postprocess_translate.py # Postprocesses translations
+ │ preprocess_translate.py # Preprocess translations and for script conversion (from indic to devnagiri)
+ │ remove_large_sentences.py # to remove large sentences from training data
+ └───remove_train_devtest_overlaps.py # Finds and removes overlaped data of train with dev and test sets
+```
+
+
+## Citing
+
+If you are using any of the resources, please cite the following article:
+```
+@misc{ramesh2021samanantar,
+ title={Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages},
+ author={Gowtham Ramesh and Sumanth Doddapaneni and Aravinth Bheemaraj and Mayank Jobanputra and Raghavan AK and Ajitesh Sharma and Sujit Sahoo and Harshita Diddee and Mahalakshmi J and Divyanshu Kakwani and Navneet Kumar and Aswin Pradeep and Kumar Deepak and Vivek Raghavan and Anoop Kunchukuttan and Pratyush Kumar and Mitesh Shantadevi Khapra},
+ year={2021},
+ eprint={2104.05596},
+ archivePrefix={arXiv},
+ primaryClass={cs.CL}
+}
+```
+
+We would like to hear from you if:
+
+- You are using our resources. Please let us know how you are putting these resources to use.
+- You have any feedback on these resources.
+
+
+
+### License
+
+The IndicTrans code (and models) are released under the MIT License.
+
+
+### Contributors
+
+- Gowtham Ramesh, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [IITM](https://www.iitm.ac.in))
+- Sumanth Doddapaneni, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [IITM](https://www.iitm.ac.in))
+- Aravinth Bheemaraj, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
+- Mayank Jobanputra, ([IITM](https://www.iitm.ac.in))
+- Raghavan AK, ([AI4Bharat](https://ai4bharat.org))
+- Ajitesh Sharma, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
+- Sujit Sahoo, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
+- Harshita Diddee, ([AI4Bharat](https://ai4bharat.org))
+- Mahalakshmi J, ([AI4Bharat](https://ai4bharat.org))
+- Divyanshu Kakwani, ([IITM](https://www.iitm.ac.in), [AI4Bharat](https://ai4bharat.org))
+- Navneet Kumar, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
+- Aswin Pradeep, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
+- Kumar Deepak, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
+- Vivek Raghavan, ([EkStep](https://ekstep.in))
+- Anoop Kunchukuttan, ([Microsoft](https://www.microsoft.com/en-in/), [AI4Bharat](https://ai4bharat.org))
+- Pratyush Kumar, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in))
+- Mitesh Shantadevi Khapra, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in))
+
+
+
+### Contact
+
+- Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](mailto:anoop.kunchukuttan@gmail.com))
+- Mitesh Khapra ([miteshk@cse.iitm.ac.in](mailto:miteshk@cse.iitm.ac.in))
+- Pratyush Kumar ([pratyush@cse.iitm.ac.in](mailto:pratyush@cse.iitm.ac.in))
diff --git a/indicTrans/api.py b/api.py
similarity index 100%
rename from indicTrans/api.py
rename to api.py
diff --git a/indicTrans/apply_bpe_traindevtest_notag.sh b/apply_bpe_traindevtest_notag.sh
similarity index 100%
rename from indicTrans/apply_bpe_traindevtest_notag.sh
rename to apply_bpe_traindevtest_notag.sh
diff --git a/indicTrans/apply_single_bpe_traindevtest_notag.sh b/apply_single_bpe_traindevtest_notag.sh
similarity index 100%
rename from indicTrans/apply_single_bpe_traindevtest_notag.sh
rename to apply_single_bpe_traindevtest_notag.sh
diff --git a/indicTrans/binarize_training_exp.sh b/binarize_training_exp.sh
similarity index 100%
rename from indicTrans/binarize_training_exp.sh
rename to binarize_training_exp.sh
diff --git a/indicTrans/compute_bleu.sh b/compute_bleu.sh
similarity index 100%
rename from indicTrans/compute_bleu.sh
rename to compute_bleu.sh
diff --git a/indicTrans/.gitignore b/indicTrans/.gitignore
deleted file mode 100644
index 5d92548a223c3df0f98d1fcb0880402ec735b754..0000000000000000000000000000000000000000
--- a/indicTrans/.gitignore
+++ /dev/null
@@ -1,143 +0,0 @@
-#ignore libs folder we use
-indic_nlp_library
-indic_nlp_resources
-subword-nmt
-
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-# Usually these files are written by a python script from a template
-# before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-# For a library or package, you might want to ignore these files since the code is
-# intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-# However, in case of collaboration, if having platform-specific dependencies or dependencies
-# having no cross-platform support, pipenv may install dependencies that don't work, or not
-# install all needed dependencies.
-#Pipfile.lock
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
\ No newline at end of file
diff --git a/indicTrans/README.md b/indicTrans/README.md
deleted file mode 100644
index 0dc4c28fb8fde490f3f76a6bc7509825f362bb0a..0000000000000000000000000000000000000000
--- a/indicTrans/README.md
+++ /dev/null
@@ -1,296 +0,0 @@
-
-
-**IndicTrans** is a Transformer-4x ( ~434M ) multilingual NMT model trained on [Samanantar](https://indicnlp.ai4bharat.org/samanantar) dataset which is the largest publicly available parallel corpora collection for Indic languages at the time of writing ( 14 April 2021 ). It is a single script model i.e we convert all the Indic data to the Devanagari script which allows for ***better lexical sharing between languages for transfer learning, prevents fragmentation of the subword vocabulary between Indic languages and allows using a smaller subword vocabulary***. We currently release two models - Indic to English and English to Indic and support the following 11 indic languages:
-
-| | | | |
-| ------------- | -------------- | ------------ | ----------- |
-| Assamese (as) | Hindi (hi) | Marathi (mr) | Tamil (ta) |
-| Bengali (bn) | Kannada (kn) | Oriya (or) | Telugu (te) |
-| Gujarati (gu) | Malayalam (ml) | Punjabi (pa) |
-
-
-
-
-- [Updates](#updates)
-- [Download IndicTrans models:](#download-indictrans-models)
-- [Using the model for translating any input](#using-the-model-for-translating-any-input)
-- [Finetuning the model on your input dataset](#finetuning-the-model-on-your-input-dataset)
-- [Mining Indic to Indic pairs from english centric corpus](#mining-indic-to-indic-pairs-from-english-centric-corpus)
-- [Installation](#installation)
-- [How to train the indictrans model on your training data?](#how-to-train-the-indictrans-model-on-your-training-data)
-- [Network & Training Details](#network--training-details)
-- [Folder Structure](#folder-structure)
-- [Citing](#citing)
- - [License](#license)
- - [Contributors](#contributors)
- - [Contact](#contact)
-
-
-## Updates
-Click to expand
-18 December 2021
-
-```
-Tutorials updated with latest model links
-```
-
-
-26 November 2021
-```
- - v0.3 models are now available for download
-```
-
-27 June 2021
-```
-- Updated links for indic to indic model
-- Add more comments to training scripts
-- Add link to [Samanantar Video](https://youtu.be/QwYPOd1eBtQ?t=383)
-- Add folder structure in readme
-- Add python wrapper for model inference
-```
-
-09 June 2021
-```
-- Updated links for models
-- Added Indic to Indic model
-```
-
-09 May 2021
-```
-- Added fix for finetuning on datasets where some lang pairs are not present. Previously the script assumed the finetuning dataset will have data for all 11 indic lang pairs
-- Added colab notebook for finetuning instructions
-```
-
-
-## Download IndicTrans models:
-
-Indic to English: [v0.3](https://storage.googleapis.com/samanantar-public/V0.3/models/indic-en.zip)
-
-English to Indic: [v0.3](https://storage.googleapis.com/samanantar-public/V0.3/models/en-indic.zip)
-
-Indic to Indic: [v0.3](https://storage.googleapis.com/samanantar-public/V0.3/models/m2m.zip)
-
-
-
-## Using the model for translating any input
-
-The model is trained on single sentences and hence, users need to split parapgraphs to sentences before running the translation when using our command line interface (The python interface has `translate_paragraph` method to handle multi sentence translations).
-
-Note: IndicTrans is trained with a max sequence length of **200** tokens (subwords). If your sentence is too long (> 200 tokens), the sentence will be truncated to 200 tokens before translation.
-
-Here is an example snippet to split paragraphs into sentences for English and Indic languages supported by our model:
-```python
-# install these libraries
-# pip install mosestokenizer
-# pip install indic-nlp-library
-
-from mosestokenizer import *
-from indicnlp.tokenize import sentence_tokenize
-
-INDIC = ["as", "bn", "gu", "hi", "kn", "ml", "mr", "or", "pa", "ta", "te"]
-
-def split_sentences(paragraph, language):
- if language == "en":
- with MosesSentenceSplitter(language) as splitter:
- return splitter([paragraph])
- elif language in INDIC:
- return sentence_tokenize.sentence_split(paragraph, lang=language)
-
-split_sentences("""COVID-19 is caused by infection with the severe acute respiratory
-syndrome coronavirus 2 (SARS-CoV-2) virus strain. The disease is mainly transmitted via the respiratory
-route when people inhale droplets and particles that infected people release as they breathe, talk, cough, sneeze, or sing. """, language='en')
-
->> ['COVID-19 is caused by infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus strain.',
- 'The disease is mainly transmitted via the respiratory route when people inhale droplets and particles that infected people release as they breathe, talk, cough, sneeze, or sing.']
-
-split_sentences("""இத்தொற்றுநோய் உலகளாவிய சமூக மற்றும் பொருளாதார சீர்குலைவை ஏற்படுத்தியுள்ளது.இதனால் பெரும் பொருளாதார மந்தநிலைக்குப் பின்னர் உலகளவில் மிகப்பெரிய மந்தநிலை ஏற்பட்டுள்ளது. இது விளையாட்டு,மத, அரசியல் மற்றும் கலாச்சார நிகழ்வுகளை ஒத்திவைக்க அல்லது ரத்து செய்ய வழிவகுத்தது.
-அச்சம் காரணமாக முகக்கவசம், கிருமிநாசினி உள்ளிட்ட பொருட்களை அதிக நபர்கள் வாங்கியதால் விநியோகப் பற்றாக்குறை ஏற்பட்டது.""",
- language='ta')
-
->> ['இத்தொற்றுநோய் உலகளாவிய சமூக மற்றும் பொருளாதார சீர்குலைவை ஏற்படுத்தியுள்ளது.',
- 'இதனால் பெரும் பொருளாதார மந்தநிலைக்குப் பின்னர் உலகளவில் மிகப்பெரிய மந்தநிலை ஏற்பட்டுள்ளது.',
- 'இது விளையாட்டு,மத, அரசியல் மற்றும் கலாச்சார நிகழ்வுகளை ஒத்திவைக்க அல்லது ரத்து செய்ய வழிவகுத்தது.',
- 'அச்சம் காரணமாக முகக்கவசம், கிருமிநாசினி உள்ளிட்ட பொருட்களை அதிக நபர்கள் வாங்கியதால் விநியோகப் பற்றாக்குறை ஏற்பட்டது.']
-
-
-```
-
-Follow the colab notebook to setup the environment, download the trained _IndicTrans_ models and translating your own text.
-
-Command line interface --> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/indictrans_fairseq_inference.ipynb)
-
-
-Python interface --> [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/indicTrans_python_interface.ipynb)
-
- The python interface is useful in case you want to reuse the model for multiple translations and do not want to reinitialize the model each time
-
-
-## Finetuning the model on your input dataset
-
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/indicTrans_Finetuning.ipynb)
-
-The colab notebook can be used to setup the environment, download the trained _IndicTrans_ models and prepare your custom dataset for funetuning the indictrans model. There is also a section on mining indic to indic data from english centric corpus for finetuning indic to indic model.
-
-**Note**: Since this is a big model (400M params), you might not be able to train with reasonable batch sizes in the free google Colab account. We are planning to release smaller models (after pruning / distallation) soon.
-
-## Mining Indic to Indic pairs from english centric corpus
-
-The `extract_non_english_pairs` in `scripts/extract_non_english_pairs.py` can be used to mine indic to indic pairs from english centric corpus.
-
-As described in the [paper](https://arxiv.org/pdf/2104.05596.pdf) (section 2.5) , we use a very strict deduplication criterion to avoid the creation of very similar parallel sentences. For example, if an en sentence is aligned to *M* hi sentences and *N* ta sentences, then we would get *MN* hi-ta pairs. However, these pairs would be very similar and not contribute much to the training process. Hence, we retain only 1 randomly chosen pair out of these *MN* pairs.
-
-```bash
-extract_non_english_pairs(indir, outdir, LANGS):
- """
- Extracts non-english pair parallel corpora
- indir: contains english centric data in the following form:
- - directory named en-xx for language xx
- - each directory contains a train.en and train.xx
- outdir: output directory to store mined data for each pair.
- One directory is created for each pair.
- LANGS: list of languages in the corpus (other than English).
- The language codes must correspond to the ones used in the
- files and directories in indir. Prefarably, sort the languages
- in this list in alphabetic order. outdir will contain data for xx-yy,
- but not for yy-xx, so it will be convenient to have this list in sorted order.
- """
-```
-
-## Installation
-Click to expand
-
-```bash
-cd indicTrans
-git clone https://github.com/anoopkunchukuttan/indic_nlp_library.git
-git clone https://github.com/anoopkunchukuttan/indic_nlp_resources.git
-git clone https://github.com/rsennrich/subword-nmt.git
-# install required libraries
-pip install sacremoses pandas mock sacrebleu tensorboardX pyarrow indic-nlp-library
-
-# Install fairseq from source
-git clone https://github.com/pytorch/fairseq.git
-cd fairseq
-pip install --editable ./
-
-```
-
-
-## How to train the indictrans model on your training data?
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AI4Bharat/indicTrans/blob/main/IndicTrans_training.ipynb)
-
-
-Follow the colab notebook to setup the environment, download the dataset and train the indicTrans model
-
-## Network & Training Details
-
-- Architechture: IndicTrans uses 6 encoder and decoder layers, input embeddings of size 1536 with 16 attention heads and
-feedforward dimension of 4096 with total number of parameters of 434M
-- Loss: Cross entropy loss
-- Optimizer: Adam
-- Label Smoothing: 0.1
-- Gradient clipping: 1.0
-- Learning rate: 5e-4
-- Warmup_steps: 4000
-
-Please refer to section 4, 5 of our [paper](https://arxiv.org/ftp/arxiv/papers/2104/2104.05596.pdf) for more details on training/experimental setup.
-
-## Folder Structure
-```
-
-IndicTrans
-│ .gitignore
-│ apply_bpe_traindevtest_notag.sh # apply bpe for joint vocab (Train, dev and test)
-│ apply_single_bpe_traindevtest_notag.sh # apply bpe for seperate vocab (Train, dev and test)
-│ binarize_training_exp.sh # binarize the training data after preprocessing for fairseq-training
-│ compute_bleu.sh # Compute blue scores with postprocessing after translating with `joint_translate.sh`
-│ indictrans_fairseq_inference.ipynb # colab example to show how to use model for inference
-│ indicTrans_Finetuning.ipynb # colab example to show how to use model for finetuning on custom domain data
-│ joint_translate.sh # used for inference (see colab inference notebook for more details on usage)
-│ learn_bpe.sh # learning joint bpe on preprocessed text
-│ learn_single_bpe.sh # learning seperate bpe on preprocessed text
-│ LICENSE
-│ prepare_data.sh # prepare data given an experiment dir (this does preprocessing,
-│ # building vocab, binarization ) for bilingual training
-│ prepare_data_joint_training.sh # prepare data given an experiment dir (this does preprocessing,
-│ # building vocab, binarization ) for joint training
-│ README.md
-│
-├───legacy # old unused scripts
-├───model_configs # custom model configrations are stored here
-│ custom_transformer.py # contains custom 4x transformer models
-│ __init__.py
-├───inference
-│ custom_interactive.py # for python wrapper around fairseq-interactive
-│ engine.py # python interface for model inference
-└───scripts # stores python scripts that are used by other bash scripts
- │ add_joint_tags_translate.py # add lang tags to the processed training data for bilingual training
- │ add_tags_translate.py # add lang tags to the processed training data for joint training
- │ clean_vocab.py # clean vocabulary after building with subword_nmt
- │ concat_joint_data.py # concatenates lang pair data and creates text files to keep track
- │ # of number of lines in each lang pair.
- │ extract_non_english_pairs.py # Mining Indic to Indic pairs from english centric corpus
- │ postprocess_translate.py # Postprocesses translations
- │ preprocess_translate.py # Preprocess translations and for script conversion (from indic to devnagiri)
- │ remove_large_sentences.py # to remove large sentences from training data
- └───remove_train_devtest_overlaps.py # Finds and removes overlaped data of train with dev and test sets
-```
-
-
-## Citing
-
-If you are using any of the resources, please cite the following article:
-```
-@misc{ramesh2021samanantar,
- title={Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages},
- author={Gowtham Ramesh and Sumanth Doddapaneni and Aravinth Bheemaraj and Mayank Jobanputra and Raghavan AK and Ajitesh Sharma and Sujit Sahoo and Harshita Diddee and Mahalakshmi J and Divyanshu Kakwani and Navneet Kumar and Aswin Pradeep and Kumar Deepak and Vivek Raghavan and Anoop Kunchukuttan and Pratyush Kumar and Mitesh Shantadevi Khapra},
- year={2021},
- eprint={2104.05596},
- archivePrefix={arXiv},
- primaryClass={cs.CL}
-}
-```
-
-We would like to hear from you if:
-
-- You are using our resources. Please let us know how you are putting these resources to use.
-- You have any feedback on these resources.
-
-
-
-### License
-
-The IndicTrans code (and models) are released under the MIT License.
-
-
-### Contributors
-
-- Gowtham Ramesh, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [IITM](https://www.iitm.ac.in))
-- Sumanth Doddapaneni, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [IITM](https://www.iitm.ac.in))
-- Aravinth Bheemaraj, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
-- Mayank Jobanputra, ([IITM](https://www.iitm.ac.in))
-- Raghavan AK, ([AI4Bharat](https://ai4bharat.org))
-- Ajitesh Sharma, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
-- Sujit Sahoo, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
-- Harshita Diddee, ([AI4Bharat](https://ai4bharat.org))
-- Mahalakshmi J, ([AI4Bharat](https://ai4bharat.org))
-- Divyanshu Kakwani, ([IITM](https://www.iitm.ac.in), [AI4Bharat](https://ai4bharat.org))
-- Navneet Kumar, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
-- Aswin Pradeep, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
-- Kumar Deepak, ([Tarento](https://www.linkedin.com/company/tarento-group/), [EkStep](https://ekstep.in))
-- Vivek Raghavan, ([EkStep](https://ekstep.in))
-- Anoop Kunchukuttan, ([Microsoft](https://www.microsoft.com/en-in/), [AI4Bharat](https://ai4bharat.org))
-- Pratyush Kumar, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in))
-- Mitesh Shantadevi Khapra, ([RBCDSAI](https://rbcdsai.iitm.ac.in), [AI4Bharat](https://ai4bharat.org), [IITM](https://www.iitm.ac.in))
-
-
-
-### Contact
-
-- Anoop Kunchukuttan ([anoop.kunchukuttan@gmail.com](mailto:anoop.kunchukuttan@gmail.com))
-- Mitesh Khapra ([miteshk@cse.iitm.ac.in](mailto:miteshk@cse.iitm.ac.in))
-- Pratyush Kumar ([pratyush@cse.iitm.ac.in](mailto:pratyush@cse.iitm.ac.in))
diff --git a/indicTrans/indicTrans_Finetuning.ipynb b/indicTrans_Finetuning.ipynb
similarity index 100%
rename from indicTrans/indicTrans_Finetuning.ipynb
rename to indicTrans_Finetuning.ipynb
diff --git a/indicTrans/indicTrans_python_interface.ipynb b/indicTrans_python_interface.ipynb
similarity index 100%
rename from indicTrans/indicTrans_python_interface.ipynb
rename to indicTrans_python_interface.ipynb
diff --git a/indicTrans/indic_nlp_library/LICENSE b/indic_nlp_library/LICENSE
similarity index 100%
rename from indicTrans/indic_nlp_library/LICENSE
rename to indic_nlp_library/LICENSE
diff --git a/indicTrans/indic_nlp_library/README.md b/indic_nlp_library/README.md
similarity index 100%
rename from indicTrans/indic_nlp_library/README.md
rename to indic_nlp_library/README.md
diff --git a/indicTrans/indic_nlp_library/contrib/README.md b/indic_nlp_library/contrib/README.md
similarity index 100%
rename from indicTrans/indic_nlp_library/contrib/README.md
rename to indic_nlp_library/contrib/README.md
diff --git a/indicTrans/indic_nlp_library/contrib/correct_moses_tokenizer.py b/indic_nlp_library/contrib/correct_moses_tokenizer.py
similarity index 100%
rename from indicTrans/indic_nlp_library/contrib/correct_moses_tokenizer.py
rename to indic_nlp_library/contrib/correct_moses_tokenizer.py
diff --git a/indicTrans/indic_nlp_library/contrib/hindi_to_kannada_transliterator.py b/indic_nlp_library/contrib/hindi_to_kannada_transliterator.py
similarity index 100%
rename from indicTrans/indic_nlp_library/contrib/hindi_to_kannada_transliterator.py
rename to indic_nlp_library/contrib/hindi_to_kannada_transliterator.py
diff --git a/indicTrans/indic_nlp_library/contrib/indic_scraper_project_sample.ipynb b/indic_nlp_library/contrib/indic_scraper_project_sample.ipynb
similarity index 100%
rename from indicTrans/indic_nlp_library/contrib/indic_scraper_project_sample.ipynb
rename to indic_nlp_library/contrib/indic_scraper_project_sample.ipynb
diff --git a/indicTrans/indic_nlp_library/docs/Makefile b/indic_nlp_library/docs/Makefile
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/Makefile
rename to indic_nlp_library/docs/Makefile
diff --git a/indicTrans/indic_nlp_library/docs/cmd.rst b/indic_nlp_library/docs/cmd.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/cmd.rst
rename to indic_nlp_library/docs/cmd.rst
diff --git a/indicTrans/indic_nlp_library/docs/code.rst b/indic_nlp_library/docs/code.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/code.rst
rename to indic_nlp_library/docs/code.rst
diff --git a/indicTrans/indic_nlp_library/docs/conf.py b/indic_nlp_library/docs/conf.py
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/conf.py
rename to indic_nlp_library/docs/conf.py
diff --git a/indicTrans/indic_nlp_library/docs/index.rst b/indic_nlp_library/docs/index.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/index.rst
rename to indic_nlp_library/docs/index.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.MD b/indic_nlp_library/docs/indicnlp.MD
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.MD
rename to indic_nlp_library/docs/indicnlp.MD
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.cli.rst b/indic_nlp_library/docs/indicnlp.cli.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.cli.rst
rename to indic_nlp_library/docs/indicnlp.cli.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.morph.rst b/indic_nlp_library/docs/indicnlp.morph.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.morph.rst
rename to indic_nlp_library/docs/indicnlp.morph.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.normalize.rst b/indic_nlp_library/docs/indicnlp.normalize.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.normalize.rst
rename to indic_nlp_library/docs/indicnlp.normalize.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.pdf b/indic_nlp_library/docs/indicnlp.pdf
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.pdf
rename to indic_nlp_library/docs/indicnlp.pdf
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.rst b/indic_nlp_library/docs/indicnlp.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.rst
rename to indic_nlp_library/docs/indicnlp.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.script.rst b/indic_nlp_library/docs/indicnlp.script.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.script.rst
rename to indic_nlp_library/docs/indicnlp.script.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.syllable.rst b/indic_nlp_library/docs/indicnlp.syllable.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.syllable.rst
rename to indic_nlp_library/docs/indicnlp.syllable.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.tokenize.rst b/indic_nlp_library/docs/indicnlp.tokenize.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.tokenize.rst
rename to indic_nlp_library/docs/indicnlp.tokenize.rst
diff --git a/indicTrans/indic_nlp_library/docs/indicnlp.transliterate.rst b/indic_nlp_library/docs/indicnlp.transliterate.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/indicnlp.transliterate.rst
rename to indic_nlp_library/docs/indicnlp.transliterate.rst
diff --git a/indicTrans/indic_nlp_library/docs/make.bat b/indic_nlp_library/docs/make.bat
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/make.bat
rename to indic_nlp_library/docs/make.bat
diff --git a/indicTrans/indic_nlp_library/docs/modules.rst b/indic_nlp_library/docs/modules.rst
similarity index 100%
rename from indicTrans/indic_nlp_library/docs/modules.rst
rename to indic_nlp_library/docs/modules.rst
diff --git a/indicTrans/indic_nlp_library/indicnlp/__init__.py b/indic_nlp_library/indicnlp/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/__init__.py
rename to indic_nlp_library/indicnlp/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/cli/__init__.py b/indic_nlp_library/indicnlp/cli/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/cli/__init__.py
rename to indic_nlp_library/indicnlp/cli/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/cli/cliparser.py b/indic_nlp_library/indicnlp/cli/cliparser.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/cli/cliparser.py
rename to indic_nlp_library/indicnlp/cli/cliparser.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/common.py b/indic_nlp_library/indicnlp/common.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/common.py
rename to indic_nlp_library/indicnlp/common.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/langinfo.py b/indic_nlp_library/indicnlp/langinfo.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/langinfo.py
rename to indic_nlp_library/indicnlp/langinfo.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/loader.py b/indic_nlp_library/indicnlp/loader.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/loader.py
rename to indic_nlp_library/indicnlp/loader.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/morph/__init__.py b/indic_nlp_library/indicnlp/morph/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/morph/__init__.py
rename to indic_nlp_library/indicnlp/morph/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/morph/unsupervised_morph.py b/indic_nlp_library/indicnlp/morph/unsupervised_morph.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/morph/unsupervised_morph.py
rename to indic_nlp_library/indicnlp/morph/unsupervised_morph.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/normalize/__init__.py b/indic_nlp_library/indicnlp/normalize/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/normalize/__init__.py
rename to indic_nlp_library/indicnlp/normalize/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/normalize/indic_normalize.py b/indic_nlp_library/indicnlp/normalize/indic_normalize.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/normalize/indic_normalize.py
rename to indic_nlp_library/indicnlp/normalize/indic_normalize.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/script/__init__.py b/indic_nlp_library/indicnlp/script/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/script/__init__.py
rename to indic_nlp_library/indicnlp/script/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/script/english_script.py b/indic_nlp_library/indicnlp/script/english_script.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/script/english_script.py
rename to indic_nlp_library/indicnlp/script/english_script.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/script/indic_scripts.py b/indic_nlp_library/indicnlp/script/indic_scripts.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/script/indic_scripts.py
rename to indic_nlp_library/indicnlp/script/indic_scripts.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/script/phonetic_sim.py b/indic_nlp_library/indicnlp/script/phonetic_sim.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/script/phonetic_sim.py
rename to indic_nlp_library/indicnlp/script/phonetic_sim.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/syllable/__init__.py b/indic_nlp_library/indicnlp/syllable/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/syllable/__init__.py
rename to indic_nlp_library/indicnlp/syllable/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/syllable/syllabifier.py b/indic_nlp_library/indicnlp/syllable/syllabifier.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/syllable/syllabifier.py
rename to indic_nlp_library/indicnlp/syllable/syllabifier.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/test/__init__.py b/indic_nlp_library/indicnlp/test/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/test/__init__.py
rename to indic_nlp_library/indicnlp/test/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/test/unit/__init__.py b/indic_nlp_library/indicnlp/test/unit/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/test/unit/__init__.py
rename to indic_nlp_library/indicnlp/test/unit/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/tokenize/__init__.py b/indic_nlp_library/indicnlp/tokenize/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/tokenize/__init__.py
rename to indic_nlp_library/indicnlp/tokenize/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/tokenize/indic_detokenize.py b/indic_nlp_library/indicnlp/tokenize/indic_detokenize.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/tokenize/indic_detokenize.py
rename to indic_nlp_library/indicnlp/tokenize/indic_detokenize.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/tokenize/indic_tokenize.py b/indic_nlp_library/indicnlp/tokenize/indic_tokenize.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/tokenize/indic_tokenize.py
rename to indic_nlp_library/indicnlp/tokenize/indic_tokenize.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/tokenize/sentence_tokenize.py b/indic_nlp_library/indicnlp/tokenize/sentence_tokenize.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/tokenize/sentence_tokenize.py
rename to indic_nlp_library/indicnlp/tokenize/sentence_tokenize.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/transliterate/__init__.py b/indic_nlp_library/indicnlp/transliterate/__init__.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/transliterate/__init__.py
rename to indic_nlp_library/indicnlp/transliterate/__init__.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/transliterate/acronym_transliterator.py b/indic_nlp_library/indicnlp/transliterate/acronym_transliterator.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/transliterate/acronym_transliterator.py
rename to indic_nlp_library/indicnlp/transliterate/acronym_transliterator.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/transliterate/script_unifier.py b/indic_nlp_library/indicnlp/transliterate/script_unifier.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/transliterate/script_unifier.py
rename to indic_nlp_library/indicnlp/transliterate/script_unifier.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/transliterate/sinhala_transliterator.py b/indic_nlp_library/indicnlp/transliterate/sinhala_transliterator.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/transliterate/sinhala_transliterator.py
rename to indic_nlp_library/indicnlp/transliterate/sinhala_transliterator.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/transliterate/unicode_transliterate.py b/indic_nlp_library/indicnlp/transliterate/unicode_transliterate.py
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/transliterate/unicode_transliterate.py
rename to indic_nlp_library/indicnlp/transliterate/unicode_transliterate.py
diff --git a/indicTrans/indic_nlp_library/indicnlp/version.txt b/indic_nlp_library/indicnlp/version.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/indicnlp/version.txt
rename to indic_nlp_library/indicnlp/version.txt
diff --git a/indicTrans/indic_nlp_library/requirements.txt b/indic_nlp_library/requirements.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/requirements.txt
rename to indic_nlp_library/requirements.txt
diff --git a/indicTrans/indic_nlp_library/setup.py b/indic_nlp_library/setup.py
similarity index 100%
rename from indicTrans/indic_nlp_library/setup.py
rename to indic_nlp_library/setup.py
diff --git a/indicTrans/indic_nlp_library/test_data/morph/mr.txt b/indic_nlp_library/test_data/morph/mr.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/morph/mr.txt
rename to indic_nlp_library/test_data/morph/mr.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/bn.txt b/indic_nlp_library/test_data/normalize/bn.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/bn.txt
rename to indic_nlp_library/test_data/normalize/bn.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/en.txt b/indic_nlp_library/test_data/normalize/en.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/en.txt
rename to indic_nlp_library/test_data/normalize/en.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/gu.txt b/indic_nlp_library/test_data/normalize/gu.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/gu.txt
rename to indic_nlp_library/test_data/normalize/gu.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/hi.txt b/indic_nlp_library/test_data/normalize/hi.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/hi.txt
rename to indic_nlp_library/test_data/normalize/hi.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/kK.txt b/indic_nlp_library/test_data/normalize/kK.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/kK.txt
rename to indic_nlp_library/test_data/normalize/kK.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/ma.txt b/indic_nlp_library/test_data/normalize/ma.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/ma.txt
rename to indic_nlp_library/test_data/normalize/ma.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/mr.txt b/indic_nlp_library/test_data/normalize/mr.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/mr.txt
rename to indic_nlp_library/test_data/normalize/mr.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/pa.txt b/indic_nlp_library/test_data/normalize/pa.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/pa.txt
rename to indic_nlp_library/test_data/normalize/pa.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/ta.txt b/indic_nlp_library/test_data/normalize/ta.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/ta.txt
rename to indic_nlp_library/test_data/normalize/ta.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/te.txt b/indic_nlp_library/test_data/normalize/te.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/te.txt
rename to indic_nlp_library/test_data/normalize/te.txt
diff --git a/indicTrans/indic_nlp_library/test_data/normalize/ur.txt b/indic_nlp_library/test_data/normalize/ur.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/normalize/ur.txt
rename to indic_nlp_library/test_data/normalize/ur.txt
diff --git a/indicTrans/indic_nlp_library/test_data/tokenize/trivial.txt b/indic_nlp_library/test_data/tokenize/trivial.txt
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/tokenize/trivial.txt
rename to indic_nlp_library/test_data/tokenize/trivial.txt
diff --git a/indicTrans/indic_nlp_library/test_data/transliterate.ipynb b/indic_nlp_library/test_data/transliterate.ipynb
similarity index 100%
rename from indicTrans/indic_nlp_library/test_data/transliterate.ipynb
rename to indic_nlp_library/test_data/transliterate.ipynb
diff --git a/indicTrans/indic_nlp_resources/README.md b/indic_nlp_resources/README.md
similarity index 100%
rename from indicTrans/indic_nlp_resources/README.md
rename to indic_nlp_resources/README.md
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/bn.model b/indic_nlp_resources/morph/morfessor/bn.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/bn.model
rename to indic_nlp_resources/morph/morfessor/bn.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/gu.model b/indic_nlp_resources/morph/morfessor/gu.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/gu.model
rename to indic_nlp_resources/morph/morfessor/gu.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/hi.model b/indic_nlp_resources/morph/morfessor/hi.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/hi.model
rename to indic_nlp_resources/morph/morfessor/hi.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/kK.model b/indic_nlp_resources/morph/morfessor/kK.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/kK.model
rename to indic_nlp_resources/morph/morfessor/kK.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/kn.model b/indic_nlp_resources/morph/morfessor/kn.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/kn.model
rename to indic_nlp_resources/morph/morfessor/kn.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/ml.model b/indic_nlp_resources/morph/morfessor/ml.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/ml.model
rename to indic_nlp_resources/morph/morfessor/ml.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/mr.model b/indic_nlp_resources/morph/morfessor/mr.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/mr.model
rename to indic_nlp_resources/morph/morfessor/mr.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/pa.model b/indic_nlp_resources/morph/morfessor/pa.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/pa.model
rename to indic_nlp_resources/morph/morfessor/pa.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/sa.model b/indic_nlp_resources/morph/morfessor/sa.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/sa.model
rename to indic_nlp_resources/morph/morfessor/sa.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/ta.model b/indic_nlp_resources/morph/morfessor/ta.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/ta.model
rename to indic_nlp_resources/morph/morfessor/ta.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/te.model b/indic_nlp_resources/morph/morfessor/te.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/te.model
rename to indic_nlp_resources/morph/morfessor/te.model
diff --git a/indicTrans/indic_nlp_resources/morph/morfessor/ur.model b/indic_nlp_resources/morph/morfessor/ur.model
similarity index 100%
rename from indicTrans/indic_nlp_resources/morph/morfessor/ur.model
rename to indic_nlp_resources/morph/morfessor/ur.model
diff --git a/indicTrans/indic_nlp_resources/script/all_script_phonetic_data.csv b/indic_nlp_resources/script/all_script_phonetic_data.csv
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/all_script_phonetic_data.csv
rename to indic_nlp_resources/script/all_script_phonetic_data.csv
diff --git a/indicTrans/indic_nlp_resources/script/all_script_phonetic_data.xlsx b/indic_nlp_resources/script/all_script_phonetic_data.xlsx
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/all_script_phonetic_data.xlsx
rename to indic_nlp_resources/script/all_script_phonetic_data.xlsx
diff --git a/indicTrans/indic_nlp_resources/script/arpabet.pdf b/indic_nlp_resources/script/arpabet.pdf
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/arpabet.pdf
rename to indic_nlp_resources/script/arpabet.pdf
diff --git a/indicTrans/indic_nlp_resources/script/english_arpabet_list.csv b/indic_nlp_resources/script/english_arpabet_list.csv
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/english_arpabet_list.csv
rename to indic_nlp_resources/script/english_arpabet_list.csv
diff --git a/indicTrans/indic_nlp_resources/script/english_script_phonetic_data.csv b/indic_nlp_resources/script/english_script_phonetic_data.csv
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/english_script_phonetic_data.csv
rename to indic_nlp_resources/script/english_script_phonetic_data.csv
diff --git a/indicTrans/indic_nlp_resources/script/english_script_phonetic_data.xlsx b/indic_nlp_resources/script/english_script_phonetic_data.xlsx
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/english_script_phonetic_data.xlsx
rename to indic_nlp_resources/script/english_script_phonetic_data.xlsx
diff --git a/indicTrans/indic_nlp_resources/script/tamil_script_phonetic_data.csv b/indic_nlp_resources/script/tamil_script_phonetic_data.csv
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/tamil_script_phonetic_data.csv
rename to indic_nlp_resources/script/tamil_script_phonetic_data.csv
diff --git a/indicTrans/indic_nlp_resources/script/tamil_script_phonetic_data.xlsx b/indic_nlp_resources/script/tamil_script_phonetic_data.xlsx
similarity index 100%
rename from indicTrans/indic_nlp_resources/script/tamil_script_phonetic_data.xlsx
rename to indic_nlp_resources/script/tamil_script_phonetic_data.xlsx
diff --git a/indicTrans/indic_nlp_resources/transliterate/README.md b/indic_nlp_resources/transliterate/README.md
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/README.md
rename to indic_nlp_resources/transliterate/README.md
diff --git a/indicTrans/indic_nlp_resources/transliterate/bn-hi.zip b/indic_nlp_resources/transliterate/bn-hi.zip
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/bn-hi.zip
rename to indic_nlp_resources/transliterate/bn-hi.zip
diff --git a/indicTrans/indic_nlp_resources/transliterate/en-hi.zip b/indic_nlp_resources/transliterate/en-hi.zip
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/en-hi.zip
rename to indic_nlp_resources/transliterate/en-hi.zip
diff --git a/indicTrans/indic_nlp_resources/transliterate/mr-hi.zip b/indic_nlp_resources/transliterate/mr-hi.zip
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/mr-hi.zip
rename to indic_nlp_resources/transliterate/mr-hi.zip
diff --git a/indicTrans/indic_nlp_resources/transliterate/offset_itrans_map.csv b/indic_nlp_resources/transliterate/offset_itrans_map.csv
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/offset_itrans_map.csv
rename to indic_nlp_resources/transliterate/offset_itrans_map.csv
diff --git a/indicTrans/indic_nlp_resources/transliterate/ta-hi.zip b/indic_nlp_resources/transliterate/ta-hi.zip
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/ta-hi.zip
rename to indic_nlp_resources/transliterate/ta-hi.zip
diff --git a/indicTrans/indic_nlp_resources/transliterate/te-hi.zip b/indic_nlp_resources/transliterate/te-hi.zip
similarity index 100%
rename from indicTrans/indic_nlp_resources/transliterate/te-hi.zip
rename to indic_nlp_resources/transliterate/te-hi.zip
diff --git a/indicTrans/indictrans_fairseq_inference.ipynb b/indictrans_fairseq_inference.ipynb
similarity index 100%
rename from indicTrans/indictrans_fairseq_inference.ipynb
rename to indictrans_fairseq_inference.ipynb
diff --git a/indicTrans/inference/__init__.py b/inference/__init__.py
similarity index 100%
rename from indicTrans/inference/__init__.py
rename to inference/__init__.py
diff --git a/indicTrans/inference/custom_interactive.py b/inference/custom_interactive.py
similarity index 100%
rename from indicTrans/inference/custom_interactive.py
rename to inference/custom_interactive.py
diff --git a/indicTrans/inference/engine.py b/inference/engine.py
similarity index 100%
rename from indicTrans/inference/engine.py
rename to inference/engine.py
diff --git a/indicTrans/interface/index.html b/interface/index.html
similarity index 100%
rename from indicTrans/interface/index.html
rename to interface/index.html
diff --git a/indicTrans/interface/logo.png b/interface/logo.png
similarity index 100%
rename from indicTrans/interface/logo.png
rename to interface/logo.png
diff --git a/indicTrans/joint_translate.sh b/joint_translate.sh
similarity index 100%
rename from indicTrans/joint_translate.sh
rename to joint_translate.sh
diff --git a/indicTrans/learn_bpe.sh b/learn_bpe.sh
similarity index 100%
rename from indicTrans/learn_bpe.sh
rename to learn_bpe.sh
diff --git a/indicTrans/learn_single_bpe.sh b/learn_single_bpe.sh
similarity index 100%
rename from indicTrans/learn_single_bpe.sh
rename to learn_single_bpe.sh
diff --git a/indicTrans/legacy/apply_bpe_test_valid_notag.sh b/legacy/apply_bpe_test_valid_notag.sh
similarity index 100%
rename from indicTrans/legacy/apply_bpe_test_valid_notag.sh
rename to legacy/apply_bpe_test_valid_notag.sh
diff --git a/indicTrans/legacy/apply_bpe_train_notag.sh b/legacy/apply_bpe_train_notag.sh
similarity index 100%
rename from indicTrans/legacy/apply_bpe_train_notag.sh
rename to legacy/apply_bpe_train_notag.sh
diff --git a/indicTrans/legacy/env.sh b/legacy/env.sh
similarity index 100%
rename from indicTrans/legacy/env.sh
rename to legacy/env.sh
diff --git a/indicTrans/legacy/indictrans_workflow.ipynb b/legacy/indictrans_workflow.ipynb
similarity index 100%
rename from indicTrans/legacy/indictrans_workflow.ipynb
rename to legacy/indictrans_workflow.ipynb
diff --git a/indicTrans/legacy/install_fairseq.sh b/legacy/install_fairseq.sh
similarity index 100%
rename from indicTrans/legacy/install_fairseq.sh
rename to legacy/install_fairseq.sh
diff --git a/indicTrans/legacy/run_inference.sh b/legacy/run_inference.sh
similarity index 100%
rename from indicTrans/legacy/run_inference.sh
rename to legacy/run_inference.sh
diff --git a/indicTrans/legacy/run_joint_inference.sh b/legacy/run_joint_inference.sh
similarity index 100%
rename from indicTrans/legacy/run_joint_inference.sh
rename to legacy/run_joint_inference.sh
diff --git a/indicTrans/legacy/tpu_training_instructions.md b/legacy/tpu_training_instructions.md
similarity index 100%
rename from indicTrans/legacy/tpu_training_instructions.md
rename to legacy/tpu_training_instructions.md
diff --git a/indicTrans/legacy/translate.sh b/legacy/translate.sh
similarity index 100%
rename from indicTrans/legacy/translate.sh
rename to legacy/translate.sh
diff --git a/indicTrans/model_configs/__init__.py b/model_configs/__init__.py
similarity index 100%
rename from indicTrans/model_configs/__init__.py
rename to model_configs/__init__.py
diff --git a/indicTrans/model_configs/custom_transformer.py b/model_configs/custom_transformer.py
similarity index 100%
rename from indicTrans/model_configs/custom_transformer.py
rename to model_configs/custom_transformer.py
diff --git a/indicTrans/prepare_data.sh b/prepare_data.sh
similarity index 100%
rename from indicTrans/prepare_data.sh
rename to prepare_data.sh
diff --git a/indicTrans/prepare_data_joint_training.sh b/prepare_data_joint_training.sh
similarity index 100%
rename from indicTrans/prepare_data_joint_training.sh
rename to prepare_data_joint_training.sh
diff --git a/indicTrans/scripts/__init__.py b/scripts/__init__.py
similarity index 100%
rename from indicTrans/scripts/__init__.py
rename to scripts/__init__.py
diff --git a/indicTrans/scripts/add_joint_tags_translate.py b/scripts/add_joint_tags_translate.py
similarity index 100%
rename from indicTrans/scripts/add_joint_tags_translate.py
rename to scripts/add_joint_tags_translate.py
diff --git a/indicTrans/scripts/add_tags_translate.py b/scripts/add_tags_translate.py
similarity index 100%
rename from indicTrans/scripts/add_tags_translate.py
rename to scripts/add_tags_translate.py
diff --git a/indicTrans/scripts/clean_vocab.py b/scripts/clean_vocab.py
similarity index 100%
rename from indicTrans/scripts/clean_vocab.py
rename to scripts/clean_vocab.py
diff --git a/indicTrans/scripts/concat_joint_data.py b/scripts/concat_joint_data.py
similarity index 100%
rename from indicTrans/scripts/concat_joint_data.py
rename to scripts/concat_joint_data.py
diff --git a/indicTrans/scripts/extract_non_english_pairs.py b/scripts/extract_non_english_pairs.py
similarity index 100%
rename from indicTrans/scripts/extract_non_english_pairs.py
rename to scripts/extract_non_english_pairs.py
diff --git a/indicTrans/scripts/postprocess_translate.py b/scripts/postprocess_translate.py
similarity index 100%
rename from indicTrans/scripts/postprocess_translate.py
rename to scripts/postprocess_translate.py
diff --git a/indicTrans/scripts/preprocess_translate.py b/scripts/preprocess_translate.py
similarity index 100%
rename from indicTrans/scripts/preprocess_translate.py
rename to scripts/preprocess_translate.py
diff --git a/indicTrans/scripts/remove_large_sentences.py b/scripts/remove_large_sentences.py
similarity index 100%
rename from indicTrans/scripts/remove_large_sentences.py
rename to scripts/remove_large_sentences.py
diff --git a/indicTrans/scripts/remove_train_devtest_overlaps.py b/scripts/remove_train_devtest_overlaps.py
similarity index 100%
rename from indicTrans/scripts/remove_train_devtest_overlaps.py
rename to scripts/remove_train_devtest_overlaps.py
diff --git a/indicTrans/subword-nmt/.github/workflows/pythonpublish.yml b/subword-nmt/.github/workflows/pythonpublish.yml
similarity index 100%
rename from indicTrans/subword-nmt/.github/workflows/pythonpublish.yml
rename to subword-nmt/.github/workflows/pythonpublish.yml
diff --git a/indicTrans/subword-nmt/.gitignore b/subword-nmt/.gitignore
similarity index 100%
rename from indicTrans/subword-nmt/.gitignore
rename to subword-nmt/.gitignore
diff --git a/indicTrans/subword-nmt/CHANGELOG.md b/subword-nmt/CHANGELOG.md
similarity index 100%
rename from indicTrans/subword-nmt/CHANGELOG.md
rename to subword-nmt/CHANGELOG.md
diff --git a/indicTrans/subword-nmt/LICENSE b/subword-nmt/LICENSE
similarity index 100%
rename from indicTrans/subword-nmt/LICENSE
rename to subword-nmt/LICENSE
diff --git a/indicTrans/subword-nmt/README.md b/subword-nmt/README.md
similarity index 100%
rename from indicTrans/subword-nmt/README.md
rename to subword-nmt/README.md
diff --git a/indicTrans/subword-nmt/apply_bpe.py b/subword-nmt/apply_bpe.py
similarity index 100%
rename from indicTrans/subword-nmt/apply_bpe.py
rename to subword-nmt/apply_bpe.py
diff --git a/indicTrans/subword-nmt/get_vocab.py b/subword-nmt/get_vocab.py
similarity index 100%
rename from indicTrans/subword-nmt/get_vocab.py
rename to subword-nmt/get_vocab.py
diff --git a/indicTrans/subword-nmt/learn_bpe.py b/subword-nmt/learn_bpe.py
similarity index 100%
rename from indicTrans/subword-nmt/learn_bpe.py
rename to subword-nmt/learn_bpe.py
diff --git a/indicTrans/subword-nmt/learn_joint_bpe_and_vocab.py b/subword-nmt/learn_joint_bpe_and_vocab.py
similarity index 100%
rename from indicTrans/subword-nmt/learn_joint_bpe_and_vocab.py
rename to subword-nmt/learn_joint_bpe_and_vocab.py
diff --git a/indicTrans/subword-nmt/setup.py b/subword-nmt/setup.py
similarity index 100%
rename from indicTrans/subword-nmt/setup.py
rename to subword-nmt/setup.py
diff --git a/indicTrans/subword-nmt/subword_nmt/__init__.py b/subword-nmt/subword_nmt/__init__.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/__init__.py
rename to subword-nmt/subword_nmt/__init__.py
diff --git a/indicTrans/subword-nmt/subword_nmt/apply_bpe.py b/subword-nmt/subword_nmt/apply_bpe.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/apply_bpe.py
rename to subword-nmt/subword_nmt/apply_bpe.py
diff --git a/indicTrans/subword-nmt/subword_nmt/bpe_toy.py b/subword-nmt/subword_nmt/bpe_toy.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/bpe_toy.py
rename to subword-nmt/subword_nmt/bpe_toy.py
diff --git a/indicTrans/subword-nmt/subword_nmt/chrF.py b/subword-nmt/subword_nmt/chrF.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/chrF.py
rename to subword-nmt/subword_nmt/chrF.py
diff --git a/indicTrans/subword-nmt/subword_nmt/get_vocab.py b/subword-nmt/subword_nmt/get_vocab.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/get_vocab.py
rename to subword-nmt/subword_nmt/get_vocab.py
diff --git a/indicTrans/subword-nmt/subword_nmt/learn_bpe.py b/subword-nmt/subword_nmt/learn_bpe.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/learn_bpe.py
rename to subword-nmt/subword_nmt/learn_bpe.py
diff --git a/indicTrans/subword-nmt/subword_nmt/learn_joint_bpe_and_vocab.py b/subword-nmt/subword_nmt/learn_joint_bpe_and_vocab.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/learn_joint_bpe_and_vocab.py
rename to subword-nmt/subword_nmt/learn_joint_bpe_and_vocab.py
diff --git a/indicTrans/subword-nmt/subword_nmt/segment_char_ngrams.py b/subword-nmt/subword_nmt/segment_char_ngrams.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/segment_char_ngrams.py
rename to subword-nmt/subword_nmt/segment_char_ngrams.py
diff --git a/indicTrans/subword-nmt/subword_nmt/subword_nmt.py b/subword-nmt/subword_nmt/subword_nmt.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/subword_nmt.py
rename to subword-nmt/subword_nmt/subword_nmt.py
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/__init__.py b/subword-nmt/subword_nmt/tests/__init__.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/__init__.py
rename to subword-nmt/subword_nmt/tests/__init__.py
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/data/.gitignore b/subword-nmt/subword_nmt/tests/data/.gitignore
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/data/.gitignore
rename to subword-nmt/subword_nmt/tests/data/.gitignore
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/data/bpe.ref b/subword-nmt/subword_nmt/tests/data/bpe.ref
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/data/bpe.ref
rename to subword-nmt/subword_nmt/tests/data/bpe.ref
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/data/corpus.bpe.ref.en b/subword-nmt/subword_nmt/tests/data/corpus.bpe.ref.en
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/data/corpus.bpe.ref.en
rename to subword-nmt/subword_nmt/tests/data/corpus.bpe.ref.en
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/data/corpus.en b/subword-nmt/subword_nmt/tests/data/corpus.en
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/data/corpus.en
rename to subword-nmt/subword_nmt/tests/data/corpus.en
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/test_bpe.py b/subword-nmt/subword_nmt/tests/test_bpe.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/test_bpe.py
rename to subword-nmt/subword_nmt/tests/test_bpe.py
diff --git a/indicTrans/subword-nmt/subword_nmt/tests/test_glossaries.py b/subword-nmt/subword_nmt/tests/test_glossaries.py
similarity index 100%
rename from indicTrans/subword-nmt/subword_nmt/tests/test_glossaries.py
rename to subword-nmt/subword_nmt/tests/test_glossaries.py