minghaowu commited on
Commit
845b3ba
1 Parent(s): f02590e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -0
README.md ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ widget:
7
+ - text: >-
8
+ Below is an instruction that describes a task.
9
+
10
+ Write a response that appropriately completes the request.
11
+
12
+
13
+
14
+ ### Instruction:
15
+
16
+ how can I become more healthy?
17
+
18
+
19
+ ### Response:
20
+ example_title: example
21
+ ---
22
+
23
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
+ should probably proofread and complete it, then remove this comment. -->
25
+
26
+ <p align="center" width="100%">
27
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/LaMnin.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
28
+ </p>
29
+
30
+ # LaMini-GPT-774M
31
+
32
+ [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
33
+
34
+ This model is one of our LaMini model series in paper "[LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini)".
35
+ This model is a fine-tuned version of [cerebras/Cerebras-GPT-256M](https://huggingface.co/cerebras/Cerebras-GPT-256M) on [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini/).
36
+ You can view other LaMini model series as follow. Note that not all models are performing as well. Models with ✩ are those with the best overall performance given their size/architecture. More details can be seen in our paper.
37
+
38
+ <table>
39
+ <thead>
40
+ <tr>
41
+ <th>Base model</th>
42
+ <th colspan="4">LaMini series (#parameters)</th>
43
+ </tr>
44
+ </thead>
45
+ <tbody>
46
+ <tr>
47
+ <td>T5</td>
48
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-61m" target="_blank" rel="noopener noreferrer">LaMini-T5-61M</a></td>
49
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-223m" target="_blank" rel="noopener noreferrer">LaMini-T5-223M</a></td>
50
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-738m" target="_blank" rel="noopener noreferrer">LaMini-T5-738M</a></td>
51
+ <td></td>
52
+ </tr>
53
+ <tr>
54
+ <td>Flan-T5</td>
55
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-77m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-77M</a>✩</td>
56
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-248m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-248M</a>✩</td>
57
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-783m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-783M</a>✩</td>
58
+ <td></td>
59
+ </tr>
60
+ <tr>
61
+ <td>Cerebras-GPT</td>
62
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-111m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-111M</a></td>
63
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-256m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-256M</a></td>
64
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-590m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-590M</a></td>
65
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-1.3B</a></td>
66
+ </tr>
67
+ <tr>
68
+ <td>GPT-2</td>
69
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a>✩</td>
70
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a>✩</td>
71
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a>✩</td>
72
+ <td></td>
73
+ </tr>
74
+ <tr>
75
+ <td>GPT-Neo</td>
76
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-125m" target="_blank" rel="noopener noreferrer">LaMini-Neo-125M</a></td>
77
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Neo-1.3B</a></td>
78
+ <td></td>
79
+ <td></td>
80
+ </tr>
81
+ <tr>
82
+ <td>GPT-J</td>
83
+ <td colspan="4">coming soon</td>
84
+ </tr>
85
+ <tr>
86
+ <td>LLaMA</td>
87
+ <td colspan="4">coming soon</td>
88
+ </tr>
89
+
90
+
91
+ </tbody>
92
+ </table>
93
+
94
+
95
+ ## Use
96
+
97
+ ### Intended use
98
+ We recommend using the model to respond to human instructions written in natural language.
99
+ Since this decoder-only model is fine-tuned with wrapper text, we suggest using the same wrapper text to achieve the best performance.
100
+ See the example on the right or the code below.
101
+
102
+ We now show you how to load and use our model using HuggingFace `pipline()`.
103
+
104
+ ```python
105
+ # pip install -q transformers
106
+ from transformers import pipeline
107
+
108
+ checkpoint = "{model_name}"
109
+
110
+ model = pipeline('text-generation', model=checkpoint, use_auth_token=True)
111
+
112
+ instruction = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
113
+
114
+ input_prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
115
+
116
+ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
117
+
118
+ print("Response": generated_text)
119
+ ```
120
+
121
+ ## Training Procedure
122
+
123
+ <p align="center" width="100%">
124
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
125
+ </p>
126
+
127
+ We initialize with [cerebras/Cerebras-GPT-256M](https://huggingface.co/cerebras/Cerebras-GPT-256M) and fine-tune it on our [LaMini dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 77M.
128
+
129
+ ### Training Hyperparameters
130
+
131
+
132
+
133
+ ## Evaluation
134
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
135
+
136
+ ## Limitations
137
+
138
+ More information needed
139
+
140
+
141
+ # Citation
142
+
143
+ ```bibtex
144
+ @misc{lamini,
145
+ title={LaMini: A Diverse Herd of Distilled Models from Large-Scale Instructions},
146
+ author={},
147
+ year={2023},
148
+ publisher = {GitHub},
149
+ journal = {GitHub repository},
150
+ }
151
+ ```