Bengali to English Word Aligner

Finetuned Model for Bengali to English Word which was build on bert-base-multilingual-cased

Quick Start

Initialize to use it in your project

tokenizer = AutoTokenizer.from_pretrained("musfiqdehan/bengali-english-word-aligner")
model = AutoModel.from_pretrained("musfiqdehan/bengali-english-word-aligner")

Bengali-English Word Alignment

Open In Colab

Kaggle

Install Dependencies

!pip install -U data-preprocessors
!pip install -U bangla-postagger

Import Necessary Libraries

from pprint import pprint
from data_preprocessors import text_preprocessor as tp
from bangla_postagger import (en_postaggers as ep,
                              bn_en_mapper as bem,
                              translators as trans)

Testing Word Mapping and Alignment

src = "আমি ভাত খাই না, রুটি খাই।"
tgt = "I do not eat rice, I eat bread."

# Give one space before and after punctuation
# for easy tokenization
src = tp.space_punc(src)
tgt = tp.space_punc(tgt)

print("Word Mapping:")
mapping = bem.get_word_mapping(
    source=src, target=tgt, model_path="musfiqdehan/bengali-english-word-aligner")
pprint(mapping)

Output

Word Mapping:
['bn:(আমি) -> en:(I)',
 'bn:(ভাত) -> en:(rice)',
 'bn:(খাই) -> en:(do)',
 'bn:(খাই) -> en:(eat)',
 'bn:(না) -> en:(not)',
 'bn:(,) -> en:(,)',
 'bn:(রুটি) -> en:(bread)',
 'bn:(খাই) -> en:(eat)',
 'bn:(।) -> en:(.)']
Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using musfiqdehan/bn-en-word-aligner 2