omkarthawakar commited on
Commit
6aed7fa
·
verified ·
1 Parent(s): 5873179

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -4
README.md CHANGED
@@ -2,11 +2,13 @@
2
  license: mit
3
  license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
4
  language:
5
- - en
6
  pipeline_tag: text-generation
7
  tags:
8
- - nlp
9
- - code
 
 
10
  ---
11
  # MobiLlama-05B
12
 
@@ -16,6 +18,7 @@ tags:
16
 
17
  MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was trained using the Amber data sources [Amber-Dataset](https://huggingface.co/datasets/LLM360/AmberDatasets).
18
 
 
19
 
20
  ## Model Description
21
 
@@ -25,7 +28,6 @@ MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was
25
  - **Resources for more information:**
26
  - [Training Code](https://github.com/LLM360/amber-train)
27
  - [Data Preparation](https://github.com/LLM360/amber-data-prep)
28
- - [Metrics](https://github.com/LLM360/Analysis360)
29
  - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
30
 
31
 
@@ -55,6 +57,7 @@ print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
55
  ```
56
 
57
  ## Evaluation
 
58
  | Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
59
  | ----------- | ----------- | ----------- |
60
  | HellaSwag | 0.5252 | 0.5409 | 0.6299 |
@@ -67,7 +70,22 @@ print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
67
  | SIQA | 0.4022 | 0.4160 | 0.4196 |
68
  | Winogrande | 0.5753 | 0.5745 | 0.6108 |
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ## Intended Uses
71
 
72
  Given the nature of the training data, the MobiLlama-05B model is best suited for prompts using the QA format, the chat format, and the code format.
73
 
 
 
 
2
  license: mit
3
  license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
4
  language:
5
+ - en
6
  pipeline_tag: text-generation
7
  tags:
8
+ - nlp
9
+ - code
10
+ datasets:
11
+ - LLM360/AmberDatasets
12
  ---
13
  # MobiLlama-05B
14
 
 
18
 
19
  MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was trained using the Amber data sources [Amber-Dataset](https://huggingface.co/datasets/LLM360/AmberDatasets).
20
 
21
+ [Github](https://github.com/mbzuai-oryx/MobiLlama)
22
 
23
  ## Model Description
24
 
 
28
  - **Resources for more information:**
29
  - [Training Code](https://github.com/LLM360/amber-train)
30
  - [Data Preparation](https://github.com/LLM360/amber-data-prep)
 
31
  - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
32
 
33
 
 
57
  ```
58
 
59
  ## Evaluation
60
+
61
  | Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
62
  | ----------- | ----------- | ----------- |
63
  | HellaSwag | 0.5252 | 0.5409 | 0.6299 |
 
70
  | SIQA | 0.4022 | 0.4160 | 0.4196 |
71
  | Winogrande | 0.5753 | 0.5745 | 0.6108 |
72
 
73
+
74
+ ## Hyperparameters
75
+ | Hyperparameter | Value |
76
+ | ----------- | ----------- |
77
+ | Total Parameters | 0.52B |
78
+ | Hidden Size | 2048 |
79
+ | Intermediate Size (MLPs) | 5632 |
80
+ | Number of Attention Heads | 32 |
81
+ | Number of Hidden Lyaers | 22 |
82
+ | RMSNorm ɛ | 1e^-5 |
83
+ | Max Seq Length | 2048 |
84
+ | Vocab Size | 32000 |
85
+
86
  ## Intended Uses
87
 
88
  Given the nature of the training data, the MobiLlama-05B model is best suited for prompts using the QA format, the chat format, and the code format.
89
 
90
+ ## Citation
91
+ Coming soon