tesser-ai
/

Tesser-Llama-3-Ko-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

schnoh commited on Jun 11, 2024

Commit

8c99926

·

verified ·

1 Parent(s): 9b7be80

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -44,6 +44,9 @@ We sampled 16B tokens from the following datasets for training:
   </tr>
 </table>
 ### Hyperparameters
 <table>
@@ -128,7 +131,7 @@ We evaluated this model using both English and Korean benchmarks, and compared i
    <td><strong>73.8</strong></td>
   </tr>
   <tr>
-   <td><strong>tesser/Tesser-Llama-3-Ko-8B</strong></td>
    <td><u>60.5</u></td>
    <td><u>79.8</u></td>
    <td><u>40.3</u></td>

   </tr>
 </table>
+We trained this model using a context length of 4k due to resource limitations and to maximize training speed.
+However, the original model was trained with a context length of 8k, so an 8k context length could work well in downstream tasks.
 ### Hyperparameters
 <table>
    <td><strong>73.8</strong></td>
   </tr>
   <tr>
+   <td><strong>tesser-ai/Tesser-Llama-3-Ko-8B</strong></td>
    <td><u>60.5</u></td>
    <td><u>79.8</u></td>
    <td><u>40.3</u></td>