Update README.md
Browse files
README.md
CHANGED
@@ -49,6 +49,11 @@ Figure 3: Test loss closeup, testing performed on split of internal-corpus #1. S
|
|
49 |
|
50 |
## Training Method
|
51 |
### Vocabulary Swap
|
|
|
|
|
|
|
|
|
|
|
52 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
53 |
We managed to align 4,177 english tokens with corresponding czech tokens.
|
54 |
|
|
|
49 |
|
50 |
## Training Method
|
51 |
### Vocabulary Swap
|
52 |
+
To transfer knowledge from English model to Czech, we developed a simple method that (i) aligns several tokens between two vocabularies and (ii) copies the embeddings from original language to new language.
|
53 |
+
<img src="figures/tllama_test.png" width="900"/>
|
54 |
+
|
55 |
+
Figure 4: Ablation: Test perplexity over the course of training for vocabulary swap method on TinyLLAMA. Our method (green curve) vs TinyLLAMA training from scratch (blue curve).
|
56 |
+
|
57 |
The vocabulary swap was done the same way as our [Czech-GPT-2](https://huggingface.co/BUT-FIT/Czech-GPT-2-XL-133k) model (check it out for comprehensive description.)
|
58 |
We managed to align 4,177 english tokens with corresponding czech tokens.
|
59 |
|