eunyounglee commited on
Commit
fccfcee
1 Parent(s): a788b47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -15
README.md CHANGED
@@ -8,36 +8,30 @@ Config file: 2.7B
8
  ---
9
  # Model Card for Model ID
10
 
11
- This model is pretrained and fine-tuned with Vietnamese dataset, based on GPT-NeoX which is a large language model developed by EleutherAI.
12
- Pretrained GPT-NeoX model with 450GB+ Vietnamese dataset. Fine-tuned with 12MB Vietnamese Question & Answer dataset. Trained on A100 40GB GPU and 48 core CPU. Took 18 hours to reach 10 epochs.
13
 
14
  ## Model Details
15
 
16
  ### Training Data
17
  - **Pre-train:**
18
- Vietnamese CulturaX Dataset(450GB) + Project(1.3GB) + Crawled Vietnamese Wikipedia(630MB) + viwik18(1.27GB)
19
  - **Fine-tuning:**
20
  12MB Vietnamese Question & Answer dataset
21
  Vietnamese Alpaca(16412 rows) + Vietnamese QA Dataset based on viwik18(14293 rows)
22
 
23
  ### Training Hardware
24
- - **Developed by:** Deeploading
25
- - **Model type:** GPT-NeoX
26
- - **Language(s) (NLP):** Vietnamese
27
 
28
  <figure style="width:30em">
29
 
30
  | Hyperparameter | Value |
31
  | ---------------------- | ----------- |
32
- | n<sub>parameters</sub> | 2670182400 |
33
- | n<sub>layers</sub> | 32 |
34
- | d<sub>model</sub> | 2560 |
35
- | n<sub>heads</sub> | 32 |
36
- | d<sub>head</sub> | 128 |
37
- | n<sub>vocab</sub> | 60000 |
38
- | Sequence Length | 2048 |
39
- | Learning Rate | 0.00016 |
40
- | Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
41
  </figure>
42
 
43
  ### How to use
 
8
  ---
9
  # Model Card for Model ID
10
 
11
+ This model is pretrained and fine-tuned with Vietnamese language, based on GPT-NeoX which is a large language model developed by EleutherAI.
12
+
13
 
14
  ## Model Details
15
 
16
  ### Training Data
17
  - **Pre-train:**
18
+ Culturax Vietnamese Dataset(450GB) + AI-Hub Vietnamese Dataset(1.3GB) + Crawled Vietnamese Wikipedia Dataset(630MB) + viwik18 Dataset(1.27GB)
19
  - **Fine-tuning:**
20
  12MB Vietnamese Question & Answer dataset
21
  Vietnamese Alpaca(16412 rows) + Vietnamese QA Dataset based on viwik18(14293 rows)
22
 
23
  ### Training Hardware
24
+ Trained on A100 40GB GPU and 48 core CPU. Took 18 hours to reach 10 epochs.
 
 
25
 
26
  <figure style="width:30em">
27
 
28
  | Hyperparameter | Value |
29
  | ---------------------- | ----------- |
30
+ | num_train_epochs | 2670182400 |
31
+ | train_batch_size | 2 |
32
+ | learning_rate | 0.0001 |
33
+ | warmup_steps | 1000 |
34
+ | weight_decay | 0 |
 
 
 
 
35
  </figure>
36
 
37
  ### How to use