MBZUAI
/

MobiLlama-05B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

omkarthawakar commited on Feb 26, 2024

Commit

6aed7fa

·

verified ·

1 Parent(s): 5873179

Update README.md

Files changed (1) hide show

README.md +22 -4

README.md CHANGED Viewed

@@ -2,11 +2,13 @@
 license: mit
 license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
 language:
-  - en
 pipeline_tag: text-generation
 tags:
-  - nlp
-  - code
 ---
 # MobiLlama-05B
@@ -16,6 +18,7 @@ tags:
 MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was trained using the Amber data sources [Amber-Dataset](https://huggingface.co/datasets/LLM360/AmberDatasets).
 ## Model Description
@@ -25,7 +28,6 @@ MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was
 - **Resources for more information:**
   - [Training Code](https://github.com/LLM360/amber-train)
   - [Data Preparation](https://github.com/LLM360/amber-data-prep)
-  - [Metrics](https://github.com/LLM360/Analysis360)
   - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
@@ -55,6 +57,7 @@ print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
 ```
 ## Evaluation
 | Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
 | ----------- | ----------- | ----------- |
 | HellaSwag | 0.5252 | 0.5409 | 0.6299 |
@@ -67,7 +70,22 @@ print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())
 | SIQA | 0.4022 | 0.4160 | 0.4196 |
 | Winogrande | 0.5753 | 0.5745 | 0.6108 |
 ## Intended Uses
 Given the nature of the training data, the MobiLlama-05B model is best suited for prompts using the QA format, the chat format, and the code format.

 license: mit
 license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
 language:
+- en
 pipeline_tag: text-generation
 tags:
+- nlp
+- code
+datasets:
+- LLM360/AmberDatasets
 ---
 # MobiLlama-05B
 MobiLlama-05B is a Small Language Model with **0.5 billion** parameters. It was trained using the Amber data sources [Amber-Dataset](https://huggingface.co/datasets/LLM360/AmberDatasets).
+[Github](https://github.com/mbzuai-oryx/MobiLlama)
 ## Model Description
 - **Resources for more information:**
   - [Training Code](https://github.com/LLM360/amber-train)
   - [Data Preparation](https://github.com/LLM360/amber-data-prep)
   - [Fully processed Amber pretraining data](https://huggingface.co/datasets/LLM360/AmberDatasets)
 ```
 ## Evaluation
 | Evaluation Benchmark | MobiLlama-0.5B | MobiLlama-0.8B | MobiLlama-1.2B |
 | ----------- | ----------- | ----------- |
 | HellaSwag | 0.5252 | 0.5409 | 0.6299 |
 | SIQA | 0.4022 | 0.4160 | 0.4196 |
 | Winogrande | 0.5753 | 0.5745 | 0.6108 |
+## Hyperparameters
+| Hyperparameter      | Value |
+| ----------- | ----------- |
+| Total Parameters      | 0.52B       |
+| Hidden Size   | 2048        |
+| Intermediate Size (MLPs)   | 5632        |
+| Number of Attention Heads   | 32        |
+| Number of Hidden Lyaers  | 22        |
+| RMSNorm ɛ  | 1e^-5        |
+| Max Seq Length   | 2048        |
+| Vocab Size | 32000 |
 ## Intended Uses
 Given the nature of the training data, the MobiLlama-05B model is best suited for prompts using the QA format, the chat format, and the code format.
+## Citation
+Coming soon