Update README.md
Browse files
README.md
CHANGED
@@ -44,6 +44,9 @@ We sampled 16B tokens from the following datasets for training:
|
|
44 |
</tr>
|
45 |
</table>
|
46 |
|
|
|
|
|
|
|
47 |
### Hyperparameters
|
48 |
|
49 |
<table>
|
@@ -128,7 +131,7 @@ We evaluated this model using both English and Korean benchmarks, and compared i
|
|
128 |
<td><strong>73.8</strong></td>
|
129 |
</tr>
|
130 |
<tr>
|
131 |
-
<td><strong>tesser/Tesser-Llama-3-Ko-8B</strong></td>
|
132 |
<td><u>60.5</u></td>
|
133 |
<td><u>79.8</u></td>
|
134 |
<td><u>40.3</u></td>
|
|
|
44 |
</tr>
|
45 |
</table>
|
46 |
|
47 |
+
We trained this model using a context length of 4k due to resource limitations and to maximize training speed.
|
48 |
+
However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks.
|
49 |
+
|
50 |
### Hyperparameters
|
51 |
|
52 |
<table>
|
|
|
131 |
<td><strong>73.8</strong></td>
|
132 |
</tr>
|
133 |
<tr>
|
134 |
+
<td><strong>tesser-ai/Tesser-Llama-3-Ko-8B</strong></td>
|
135 |
<td><u>60.5</u></td>
|
136 |
<td><u>79.8</u></td>
|
137 |
<td><u>40.3</u></td>
|