MBZUAI
/

LaMini-Flan-T5-77M

@@ -18,10 +18,68 @@ widget:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# LaMini-FLAN-T5-Small
 This model is one of our LaMini model series in paper "[LaMini: Distilling Knowledge from Large Language Models]()". This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository]().
 ## Training Procedure
 We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
@@ -41,124 +99,12 @@ The following hyperparameters were used during training:
 ## Evaluation
 We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
-## More Models
-You can download LaMini model series as follow. Note that not all models are performing as well. More details can be seen in our [paper]().
-<details>
-<summary> Click to expand </summary>
-<table>
-    <caption>
-    LaMini Language Models collection.
-  </caption>
-  <thead>
-    <tr>
-      <th>Name</th>
-      <th>Architecture</th>
-      <th>Initialization</th>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-      <td>LaMini-T5-61M</td>
-      <td>encoder-decoder</td>
-      <td>T5-small</td>
-    </tr>
-    <tr>
-      <td>LaMini-T5-223M</td>
-      <td>encoder-decoder</td>
-      <td>T5-base</td>
-    </tr>
-    <tr>
-      <td>LaMini-T5-738M</td>
-      <td>encoder-decoder</td>
-      <td>T5-large</td>
-    </tr>
-    <tr>
-      <td>LaMini-Flan-T5-77M</td>
-      <td>encoder-decoder</td>
-      <td>Flan-T5-small</td>
-    </tr>
-    <tr>
-      <td>LaMini-Flan-T5-248M</td>
-      <td>encoder-decoder</td>
-      <td>Flan-T5-base</td>
-    </tr>
-    <tr>
-      <td>LaMini-Flan-T5-783M</td>
-      <td>encoder-decoder</td>
-      <td>Flan-T5-large</td>
-    </tr>
-    <tr>
-      <td>LaMini-Cb-111M</td>
-      <td>decoder-only</td>
-      <td>Cerebras-GPT-111M</td>
-    </tr>
-    <tr>
-      <td>LaMini-Cb-256M</td>
-      <td>decoder-only</td>
-      <td>Cerebras-GPT-256M</td>
-    </tr>
-    <tr>
-      <td>LaMini-Cb-590M</td>
-      <td>decoder-only</td>
-      <td>Cerebras-GPT-590M</td>
-    </tr>
-    <tr>
-      <td>LaMini-Cb-1.3B</td>
-      <td>decoder-only</td>
-      <td>Cerebras-GPT-1.3B</td>
-    </tr>
-    <tr>
-      <td>LaMini-GPT-124M</td>
-      <td>decoder-only</td>
-      <td>GPT-2</td>
-    </tr>
-    <tr>
-      <td>LaMini-GPT-774M</td>
-      <td>decoder-only</td>
-      <td>GPT-2 large</td>
-    </tr>
-    <tr>
-      <td>LaMini-GPT-1.5B</td>
-      <td>decoder-only</td>
-      <td>GPT-2 xl</td>
-    </tr>
-  </tbody>
-</table>
-</details>
 ## Use
 ### Intended use
 We recommend to use model to reponse to human instructions wrote in natural language.
 We now show you how to load and use our model using HuggingFace `pipline()`.
-### CPU
-<details>
-<summary> Click to expand </summary>
-```python
-# pip install -q transformers
-from transformers import pipeline
-checkpoint = "{model_name}"
-model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True)
-input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
-generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text']
-print("Response": generated_text)
-```
-</details>
-### GPU
-<details>
-<summary> Click to expand </summary>
 ```python
 # pip install -q transformers
@@ -169,13 +115,11 @@ checkpoint = "{model_name}"
 model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0)
 input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
-generated_text = generator(input_prompt, max_length=512, do_sample=True, repetition_penalty=1.5)[0]['generated_text']
 print("Response": generated_text)
 ```
-</details>
 ## Limitations
 More information needed

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# LaMini-FLAN-T5-77M
 This model is one of our LaMini model series in paper "[LaMini: Distilling Knowledge from Large Language Models]()". This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on [LaMini dataset]() that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository]().
+You can view other LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
+<table>
+<thead>
+  <tr>
+    <th>Base model</th>
+    <th colspan="4">LaMini series (#parameters)</th>
+  </tr>
+</thead>
+<tbody>
+  <tr>
+    <td>T5</td>
+    <td>LaMini-T5-61M</td>
+    <td>LaMini-T5-223M</td>
+    <td>LaMini-T5-738M</td>
+    <td></td>
+  </tr>
+   <tr>
+        <td>Flan-T5</td>
+        <td>LaMini-Flan-T5-77M</td>
+        <td>LaMini-Flan-T5-248M</td>
+        <td>LaMini-Flan-T5-783M</td>
+    <td></td>
+  </tr>
+    <tr>
+    <td>Cerebras-GPT</td>
+    <td>LaMini-Cerebras-111M</td>
+    <td>LaMini-Cerebras-256M</td>
+    <td>LaMini-Cerebras-590M</td>
+    <td>LaMini-Cerebras-1.3B</td>
+  </tr>
+  <tr>
+    <td>GPT-2</td>
+    <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a></td>
+    <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a></td>
+    <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a></td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>GPT-Neo</td>
+    <td>LaMini-Neo-125M</td>
+    <td>LaMini-Neo-1.3B</td>
+    <td></td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>GPT-J</td>
+    <td colspan="4">coming soon</td>
+  </tr>
+  <tr>
+    <td>LLaMA</td>
+    <td colspan="4">coming soon</td>
+  </tr>
+</tbody>
+</table>
 ## Training Procedure
 We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
 ## Evaluation
 We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
 ## Use
 ### Intended use
 We recommend to use model to reponse to human instructions wrote in natural language.
 We now show you how to load and use our model using HuggingFace `pipline()`.
 ```python
 # pip install -q transformers
 model = pipeline('text2text-generation', model=checkpoint, use_auth_token=True, device=0)
 input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
+generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
 print("Response": generated_text)
 ```
 ## Limitations
 More information needed