File size: 2,293 Bytes

9502375
 
 
 
 
 
 
 
 
 
 
 
 
6bdc28f
 
6a3dfd4
6bdc28f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0a9c1a
 
6bdc28f
 
f0a9c1a
6bdc28f
 
f0a9c1a
f88a24b
f0a9c1a
6bdc28f
 
f0a9c1a
f88a24b
f0a9c1a
6bdc28f
 
 
f0a9c1a
 
 
6bdc28f
 
f0a9c1a
 
 
6bdc28f
 
 
6a3dfd4
 
6bdc28f
 
 
6a3dfd4
6bdc28f
 
 
f0a9c1a
 
 
 
6bdc28f

---
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language: rw
datasets:
- DigitalUmuganda/kinyarwanda-english-machine-translation-dataset
widget:
- text: "Muraho neza, murakaza neza mu Rwanda."
  example_title: "Muraho neza, murakaza neza mu Rwanda."
---
# Kinyarwanda-to-English Machine Translation

This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.


## Model architecture
**Encoder && Decoder**
>  Type: Transformer
	Num_layer: 6
	Num_heads: 8
	Embedding_dim: 256
	ff_size: 1024
	Dropout: 0.1
	Layer_norm: post
	Initializer: xavier
	Total params: 12563968

## Pre-processing

	Tokenizer_type: subword-nmt
	num_merges: 4000
	BPE encoding learned on the bitext, separate vocabularies for each language
	Pretokenizer: None
	No lowercase applied

## Training
	Optimizer: Adam
	Loss: crossentropy
	Epochs: 30
	Batch_size: 256
	Number of GPUs: 1



## Evaluation

	Evaluation_metrics: Blue_score, chrf
	Tokenization: None
	Beam_width: 15
	Beam_alpha: 1.0

## Tools
	* joeyNMT 2.0.0
	* datasets
	* pandas
	* numpy
	* transformers
	* sentencepiece
	* pytorch(with cuda)
	* sacrebleu
	* protobuf>=3.20.1

## How to train

[Use the following link for more information](https://github.com/joeynmt/joeynmt)

## Translation
To install joeyNMT run:
```
$ git clone https://github.com/joeynmt/joeynmt.git
$ cd joeynmt
$ pip install . -e
```

Interactive translation(stdin):
```
$ python -m joeynmt translate args.yaml
```

File translation:
```
$ python -m joeynmt translate args.yaml < src_lang.txt > hypothesis_trg_lang.txt
```

## Accuracy measurement
Sacrebleu installation:
```
$ pip install sacrebleu
```

Measurement(bleu_score, chrf):
```
$ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf 
```

## To-do

>* Test the model using different datasets including the jw300
>* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models.
>* Expand the dataset

## Result
The following result was obtained using sacrebleu.


Kinyarwanda-to-English:
```
Blue: 79.87
Chrf: 84.40
```