|
--- |
|
tags: |
|
- fastai |
|
- text-translation |
|
|
|
language: ml |
|
|
|
widget: |
|
- text: "കേൾക്കുന്ന എല്ലാ കാര്യങ്ങളും എനിക്കു മനസിലായില്ല" |
|
example_title: "Malayalam Seq2Seq translation" |
|
|
|
|
|
--- |
|
|
|
# മലയാളം - English ULMFit translationmodel. (Working in Progress) |
|
|
|
|
|
[![മലയാളം: kaggle notebook](https://img.shields.io/badge/മലയാളം%20-notebook-green.svg)](https://www.kaggle.com/code/rajeshradhakrishnan/ml-ulmfit-seq2seq-translation) |
|
|
|
|
|
--- |
|
|
|
# malayalam-ULMFit-Seq2Seq (Traslation model) |
|
|
|
malayalam-ULMFit-Seq2Seq model is pre-trained on [Malyalam_Language_Model_ULMFiT](https://github.com/goru001/nlp-for-malyalam/blob/master/language-model/Malyalam_Language_Model_ULMFiT.ipynb) using [fastai](https://docs.fast.ai/text.data.html) Language Model using fastai |
|
|
|
Tokenized using Sentencepiece with a vocab size of 10000 the language model is upload to [kaggle dataset](https://www.kaggle.com/datasets/rajeshradhakrishnan/ulmfit-fastai) |
|
|
|
## Usage |
|
|
|
``` |
|
!pip install -Uqq huggingface_hub["fastai"] |
|
|
|
from huggingface_hub import from_pretrained_fastai |
|
learner = from_pretrained_fastai(repo_id) |
|
|
|
original_xtext = 'കേൾക്കുന്ന എല്ലാ കാര്യങ്ങളും എനിക്കു മനസിലായില്ല' |
|
original_ytext = 'I didnt understand all this' |
|
predicted_text = learner.predict(original_xtext) |
|
print(f'original text: {original_xtext}') |
|
print(f'original answer: {original_ytext}') |
|
print(f'predicted text: {predicted_text}') |
|
|
|
``` |
|
|
|
## Intended uses & limitations |
|
|
|
It's not fine tuned to the state of the art accuracy |
|
|
|
## Training and evaluation data |
|
|
|
[Malayalam Samanantar Dataset - uploaded to kaggle with english - malayalam ](https://www.kaggle.com/datasets/rajeshradhakrishnan/ulmfit-fastai) |
|
|