--- license: mit --- # Bengali to English Word Aligner Finetuned Model for **Bengali to English Word** which was build on `bert-base-multilingual-cased` ## Quick Start Initialize to use it in your project ```python tokenizer = AutoTokenizer.from_pretrained("musfiqdehan/bengali-english-word-aligner") model = AutoModel.from_pretrained("musfiqdehan/bengali-english-word-aligner") ``` ## Bengali-English Word Alignment [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1x5wUXS7vdWNeROkJS_B_lUwKTJZGaB7v?usp=sharing) [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/musfiqdehan/bengali-english-alignment-demo) Install Dependencies ``` !pip install -U data-preprocessors !pip install -U bangla-postagger ``` Import Necessary Libraries ```python from pprint import pprint from data_preprocessors import text_preprocessor as tp from bangla_postagger import (en_postaggers as ep, bn_en_mapper as bem, translators as trans) ``` Testing Word Mapping and Alignment ```python src = "আমি ভাত খাই না, রুটি খাই।" tgt = "I do not eat rice, I eat bread." # Give one space before and after punctuation # for easy tokenization src = tp.space_punc(src) tgt = tp.space_punc(tgt) print("Word Mapping:") mapping = bem.get_word_mapping( source=src, target=tgt, model_path="musfiqdehan/bengali-english-word-aligner") pprint(mapping) ``` Output ``` Word Mapping: ['bn:(আমি) -> en:(I)', 'bn:(ভাত) -> en:(rice)', 'bn:(খাই) -> en:(do)', 'bn:(খাই) -> en:(eat)', 'bn:(না) -> en:(not)', 'bn:(,) -> en:(,)', 'bn:(রুটি) -> en:(bread)', 'bn:(খাই) -> en:(eat)', 'bn:(।) -> en:(.)'] ```