Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ This checkpoint is trained on the Stack data (https://huggingface.co/datasets/bi
|
|
22 |
This checkpoint is first trained on code data via masked language modeling (MLM) and then on bimodal text-code pair data. Please refer to the paper for more details.
|
23 |
|
24 |
### How to use
|
25 |
-
This checkpoint consists of an encoder (356M model), which can be used to extract code embeddings of
|
26 |
|
27 |
```
|
28 |
from transformers import AutoModel, AutoTokenizer
|
|
|
22 |
This checkpoint is first trained on code data via masked language modeling (MLM) and then on bimodal text-code pair data. Please refer to the paper for more details.
|
23 |
|
24 |
### How to use
|
25 |
+
This checkpoint consists of an encoder (356M model), which can be used to extract code embeddings of 1024 dimension. It can be easily loaded using the AutoModel functionality and employs the Starcoder tokenizer (https://arxiv.org/pdf/2305.06161.pdf).
|
26 |
|
27 |
```
|
28 |
from transformers import AutoModel, AutoTokenizer
|