schnoh commited on
Commit
8c99926
·
verified ·
1 Parent(s): 9b7be80

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -44,6 +44,9 @@ We sampled 16B tokens from the following datasets for training:
44
  </tr>
45
  </table>
46
 
 
 
 
47
  ### Hyperparameters
48
 
49
  <table>
@@ -128,7 +131,7 @@ We evaluated this model using both English and Korean benchmarks, and compared i
128
  <td><strong>73.8</strong></td>
129
  </tr>
130
  <tr>
131
- <td><strong>tesser/Tesser-Llama-3-Ko-8B</strong></td>
132
  <td><u>60.5</u></td>
133
  <td><u>79.8</u></td>
134
  <td><u>40.3</u></td>
 
44
  </tr>
45
  </table>
46
 
47
+ We trained this model using a context length of 4k due to resource limitations and to maximize training speed.
48
+ However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks.
49
+
50
  ### Hyperparameters
51
 
52
  <table>
 
131
  <td><strong>73.8</strong></td>
132
  </tr>
133
  <tr>
134
+ <td><strong>tesser-ai/Tesser-Llama-3-Ko-8B</strong></td>
135
  <td><u>60.5</u></td>
136
  <td><u>79.8</u></td>
137
  <td><u>40.3</u></td>