File size: 2,293 Bytes
9502375
 
 
 
 
 
 
 
 
 
 
 
 
6bdc28f
 
6a3dfd4
6bdc28f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0a9c1a
 
6bdc28f
 
f0a9c1a
6bdc28f
 
f0a9c1a
f88a24b
f0a9c1a
6bdc28f
 
f0a9c1a
f88a24b
f0a9c1a
6bdc28f
 
 
f0a9c1a
 
 
6bdc28f
 
f0a9c1a
 
 
6bdc28f
 
 
6a3dfd4
 
6bdc28f
 
 
6a3dfd4
6bdc28f
 
 
f0a9c1a
 
 
 
6bdc28f
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
library_name: JoeyNMT
task: Machine-translation
tags:
- JoeyNMT
- Machine-translation
language: rw
datasets:
- DigitalUmuganda/kinyarwanda-english-machine-translation-dataset
widget:
- text: "Muraho neza, murakaza neza mu Rwanda."
  example_title: "Muraho neza, murakaza neza mu Rwanda."
---
# Kinyarwanda-to-English Machine Translation

This model is a Kinyarwanda-to-English machine translation model, it was built and trained using JoeyNMT framework. The translation model uses transformer encoder-decoder based architecture. It was trained on a 47,211-long English-Kinyarwanda bitext dataset prepared by Digital Umuganda.


## Model architecture
**Encoder && Decoder**
>  Type: Transformer
	Num_layer: 6
	Num_heads: 8
	Embedding_dim: 256
	ff_size: 1024
	Dropout: 0.1
	Layer_norm: post
	Initializer: xavier
	Total params: 12563968

## Pre-processing

	Tokenizer_type: subword-nmt
	num_merges: 4000
	BPE encoding learned on the bitext, separate vocabularies for each language
	Pretokenizer: None
	No lowercase applied

## Training
	Optimizer: Adam
	Loss: crossentropy
	Epochs: 30
	Batch_size: 256
	Number of GPUs: 1



## Evaluation

	Evaluation_metrics: Blue_score, chrf
	Tokenization: None
	Beam_width: 15
	Beam_alpha: 1.0

## Tools
	* joeyNMT 2.0.0
	* datasets
	* pandas
	* numpy
	* transformers
	* sentencepiece
	* pytorch(with cuda)
	* sacrebleu
	* protobuf>=3.20.1

## How to train

[Use the following link for more information](https://github.com/joeynmt/joeynmt)

## Translation
To install joeyNMT run:
```
$ git clone https://github.com/joeynmt/joeynmt.git
$ cd joeynmt
$ pip install . -e
```

Interactive translation(stdin):
```
$ python -m joeynmt translate args.yaml
```

File translation:
```
$ python -m joeynmt translate args.yaml < src_lang.txt > hypothesis_trg_lang.txt
```

## Accuracy measurement
Sacrebleu installation:
```
$ pip install sacrebleu
```

Measurement(bleu_score, chrf):
```
$ sacrebleu reference.tsv -i hypothesis.tsv -m bleu chrf 
```

## To-do

>* Test the model using different datasets including the jw300
>* Use the Digital Umuganda dataset on some available State Of The Art(SOTA) models.
>* Expand the dataset

## Result
The following result was obtained using sacrebleu.


Kinyarwanda-to-English:
```
Blue: 79.87
Chrf: 84.40
```