init
Browse files- .ipynb_checkpoints/config-checkpoint.json +0 -27
- README.md +0 -93
.ipynb_checkpoints/config-checkpoint.json
DELETED
@@ -1,27 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"_name_or_path": "./yi-ko-6b/",
|
3 |
-
"architectures": [
|
4 |
-
"LlamaForCausalLM"
|
5 |
-
],
|
6 |
-
"bos_token_id": 1,
|
7 |
-
"eos_token_id": 2,
|
8 |
-
"hidden_act": "silu",
|
9 |
-
"hidden_size": 4096,
|
10 |
-
"initializer_range": 0.02,
|
11 |
-
"intermediate_size": 11008,
|
12 |
-
"max_position_embeddings": 2048,
|
13 |
-
"model_type": "llama",
|
14 |
-
"num_attention_heads": 32,
|
15 |
-
"num_hidden_layers": 32,
|
16 |
-
"num_key_value_heads": 4,
|
17 |
-
"pad_token_id": 0,
|
18 |
-
"pretraining_tp": 1,
|
19 |
-
"rms_norm_eps": 1e-05,
|
20 |
-
"rope_scaling": null,
|
21 |
-
"rope_theta": 10000.0,
|
22 |
-
"tie_word_embeddings": false,
|
23 |
-
"torch_dtype": "bfloat16",
|
24 |
-
"transformers_version": "4.33.1",
|
25 |
-
"use_cache": true,
|
26 |
-
"vocab_size": 78464
|
27 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
DELETED
@@ -1,93 +0,0 @@
|
|
1 |
-
---
|
2 |
-
extra_gated_heading: Access beomi/Yi-Ko-6B on Hugging Face
|
3 |
-
extra_gated_button_content: Submit
|
4 |
-
extra_gated_fields:
|
5 |
-
I agree to share my name, email address and username: checkbox
|
6 |
-
I confirm that I understand this project is for research purposes only, and confirm that I agree to follow the LICENSE of this model: checkbox
|
7 |
-
language:
|
8 |
-
- en
|
9 |
-
- ko
|
10 |
-
pipeline_tag: text-generation
|
11 |
-
inference: false
|
12 |
-
tags:
|
13 |
-
- pytorch
|
14 |
-
- Yi-Ko
|
15 |
-
- 01-ai
|
16 |
-
- Yi
|
17 |
-
library_name: transformers
|
18 |
-
license: other
|
19 |
-
---
|
20 |
-
|
21 |
-
> Update @ 2023.12.03 Yi-Ko(KoEN)-6B Achieved #1🥇 Pretrained Models at [Open Korean LLM Leaderboard](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard)! 🎉
|
22 |
-
|
23 |
-
> Update @ 2023.12.01 Alpha Release of Yi-Ko(KoEN)-6B model 🎉
|
24 |
-
|
25 |
-
# **beomi/Yi-Ko-6B**
|
26 |
-
|
27 |
-
Yi-Ko series models serve as advanced iterations of 01-ai/Yi models,
|
28 |
-
benefiting from an expanded vocabulary and the inclusion of Korean/English corpus in its further pretraining.
|
29 |
-
Just like its predecessor, Yi-Ko series models operate within the broad range of generative text models that stretch from 6 billion to 34 billion parameters.
|
30 |
-
This repository focuses on the **6B** pretrained version,
|
31 |
-
which is tailored to fit the Hugging Face Transformers format.
|
32 |
-
For access to the other models, feel free to consult the index provided below.
|
33 |
-
|
34 |
-
## Model Details
|
35 |
-
|
36 |
-
**Model Developers** Junbum Lee (Beomi)
|
37 |
-
|
38 |
-
**Variations** Yi-Ko series will come in a range of parameter sizes — 6B and 34B variations.
|
39 |
-
|
40 |
-
**Input** Models input text only.
|
41 |
-
|
42 |
-
**Output** Models generate text only.
|
43 |
-
|
44 |
-
**Model Architecture**
|
45 |
-
|
46 |
-
Yi-Ko series models are an auto-regressive language model that uses an optimized transformer architecture based on Llama-2*.
|
47 |
-
|
48 |
-
<small>*Yi model architecture is based on Llama2, so it can be loaded via `LlamaForCausalLM` class on HF.</small>
|
49 |
-
|
50 |
-
|Model Name|Training Data|Params|Context Length|GQA|Trained Tokens|LR|Batch Size(per step)|
|
51 |
-
|---|---|---|---|---|---|---|---|
|
52 |
-
|Yi-Ko-6B|*A mix of Korean + English online data*|6B|4k|O|>60B|5e<sup>-5</sup>|2048|
|
53 |
-
|
54 |
-
**Vocab Expansion**
|
55 |
-
|
56 |
-
| Model Name | Vocabulary Size | Description |
|
57 |
-
| --- | --- | --- |
|
58 |
-
| Original Yi-Series | 64000 | Sentencepiece BPE |
|
59 |
-
| **Expanded Yi-Ko Series** | 78464 | Sentencepiece BPE. Added Korean vocab and merges |
|
60 |
-
|
61 |
-
**Tokenizing "안녕하세요, 오늘은 날씨가 좋네요.ㅎㅎ"**
|
62 |
-
|
63 |
-
| Model | # of tokens | Tokens |
|
64 |
-
| --- | --- | --- |
|
65 |
-
| Original Yi-Series | 47 | `['<0xEC>', '<0x95>', '<0x88>', '<0xEB>', '<0x85>', '<0x95>', '하', '<0xEC>', '<0x84>', '<0xB8>', '<0xEC>', '<0x9A>', '<0x94>', ',', '▁', '<0xEC>', '<0x98>', '<0xA4>', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '<0xEC>', '<0x94>', '<0xA8>', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '<0xEC>', '<0x9A>', '<0x94>', '.', '<0xE3>', '<0x85>', '<0x8E>', '<0xE3>', '<0x85>', '<0x8E>']` |
|
66 |
-
| **Expanded Yi-Ko Series** | 10 | `['▁안녕', '하세요', ',', '▁오늘은', '▁날', '씨가', '▁좋네요', '.', 'ㅎ', 'ㅎ']` |
|
67 |
-
|<small>*Equal Korean vocab with Llama-2-Ko Series</small>||
|
68 |
-
|
69 |
-
**Tokenizing "Llama 2: Open Foundation and Fine-Tuned Chat Models"**
|
70 |
-
|
71 |
-
| Model | # of tokens | Tokens |
|
72 |
-
| --- | --- | --- |
|
73 |
-
| Original Yi-Series | 21 | `['The', '▁Y', 'i', '▁series', '▁models', '▁are', '▁large', '▁language', '▁models', '▁trained', '▁from', '▁scratch', '▁by', '▁developers', '▁at', '▁', '0', '1', '.', 'AI', '.']` |
|
74 |
-
| **Expanded Yi-Ko Series** | 21 | `['▁The', '▁Y', 'i', '▁series', '▁models', '▁are', '▁large', '▁language', '▁models', '▁trained', '▁from', '▁scratch', '▁by', '▁developers', '▁at', '▁', '0', '1', '.', 'AI', '.']` |
|
75 |
-
|<small>*Equal Korean vocab with Llama-2-Ko Series</small>| | <small>*Since **Expanded Yi-Ko Series** prepends `_` at the beginning of the text(to ensure same tokenization for Korean sentences), it shows negilible difference for the first token on English tokenization. </small>|
|
76 |
-
|
77 |
-
# **Model Benchmark**
|
78 |
-
|
79 |
-
## LM Eval Harness - Korean (polyglot branch)
|
80 |
-
|
81 |
-
TBD
|
82 |
-
|
83 |
-
## LICENSE
|
84 |
-
|
85 |
-
TBD
|
86 |
-
|
87 |
-
## Citation
|
88 |
-
|
89 |
-
TBD
|
90 |
-
|
91 |
-
## Acknowledgement
|
92 |
-
|
93 |
-
The training is supported by [TPU Research Cloud](https://sites.research.google/trc/) program.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|