Update README.md
Browse files
README.md
CHANGED
@@ -47,6 +47,9 @@ model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SambaLingo-Turkis
|
|
47 |
|
48 |
## Training Details
|
49 |
|
|
|
|
|
|
|
50 |
## Uses
|
51 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
52 |
|
|
|
47 |
|
48 |
## Training Details
|
49 |
|
50 |
+
## Tokenizer Details
|
51 |
+
We extended the vocabulary of the base llama model from 32,000 tokens to 57,000 tokens by adding up to 25,000 non-overlapping tokens from the new language.
|
52 |
+
|
53 |
## Uses
|
54 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
55 |
|