Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- yahma/alpaca-cleaned
|
5 |
+
- tatsu-lab/alpaca
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
library_name: transformers
|
9 |
+
pipeline_tag: text-generation
|
10 |
+
---
|
11 |
+
# Model Card for Alpaca Cerebras-6.7B LoRA
|
12 |
+
|
13 |
+
This repository contains the adapter weights for the [Cerebras-6.7B](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) model finetuned on the
|
14 |
+
cleaned version of the alpaca dataset following [github.com/tloen/alpaca-lora](https://github.com/tloen/alpaca-lora). Find the code used
|
15 |
+
for finetuning at our fork: [github.com/bjoernpl/cerebras-lora](https://github.com/bjoernpl/cerebras-lora).
|
16 |
+
|
17 |
+
## Model Details
|
18 |
+
|
19 |
+
### Model Description
|
20 |
+
|
21 |
+
_Copied from [cerebras/Cerebras-GPT-6.7B](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) model card:_
|
22 |
+
|
23 |
+
The Cerebras-GPT family is released to facilitate research into LLM scaling laws using open architectures and data sets and demonstrate the simplicity of and scalability of training LLMs on the Cerebras software and hardware stack. All Cerebras-GPT models are available on Hugging Face.
|
24 |
+
|
25 |
+
The family includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models.
|
26 |
+
|
27 |
+
All models in the Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws (20 tokens per model parameter) which is compute-optimal.
|
28 |
+
|
29 |
+
These models were trained on the Andromeda AI supercomputer comprised of 16 CS-2 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
|
30 |
+
|
31 |
+
Cerebras systems for pre-training and fine tuning are available in the cloud via the Cerebras Model Studio. Cerebras CS-2 compatible checkpoints are available in Cerebras Model Zoo.
|
32 |
+
|
33 |
+
|
34 |
+
* Developed by: [Cerebras Systems](https://www.cerebras.net/) finetuned by [Björn P.](https://github.com/bjoernpl).
|
35 |
+
* License: Apache 2.0
|
36 |
+
* Model type: Transformer-based Language Model
|
37 |
+
* Architecture: GPT-3 style architecture with LoRA adapter
|
38 |
+
* Data set: The Pile
|
39 |
+
* Tokenizer: Byte Pair Encoding
|
40 |
+
* Vocabulary Size: 50257
|
41 |
+
* Sequence Length: 2048
|
42 |
+
* Optimizer: AdamW, (β1, β2) = (0.9, 0.95), adam_eps = 1e−8 (1e−9 for larger models)
|
43 |
+
* Positional Encoding: Learned
|
44 |
+
* Language: English
|
45 |
+
* Learn more: Dense Scaling Laws Paper for training procedure, config files, and details on how to use.
|
46 |
+
|
47 |
+
|
48 |
+
## Quickstart
|
49 |
+
See [github.com/bjoernpl/cerebras-lora](https://github.com/bjoernpl/cerebras-lora) for a Gradio demo and more code.
|
50 |
+
|
51 |
+
This model can be easily loaded using the AutoModelForCausalLM functionality:
|
52 |
+
```python
|
53 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
54 |
+
tokenizer = AutoTokenizer.from_pretrained("cerebras/Cerebras-GPT-6.7B")
|
55 |
+
model = AutoModelForCausalLM.from_pretrained("cerebras/Cerebras-GPT-6.7B")
|
56 |
+
model = PeftModel.from_pretrained(model, "bjoernp/alpaca-cerebras-6.7B")
|
57 |
+
text = "Generative AI is "
|
58 |
+
```
|
59 |
+
|
60 |
+
And can be used with Hugging Face Pipelines
|
61 |
+
|
62 |
+
```python
|
63 |
+
from transformers import pipeline
|
64 |
+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
65 |
+
generated_text = pipe(text, max_length=50, do_sample=False, no_repeat_ngram_size=2)[0]
|
66 |
+
print(generated_text['generated_text'])
|
67 |
+
```
|
68 |
+
|
69 |
+
or with `model.generate()`
|
70 |
+
|
71 |
+
```python
|
72 |
+
inputs = tokenizer(text, return_tensors="pt")
|
73 |
+
outputs = model.generate(**inputs, num_beams=5,
|
74 |
+
max_new_tokens=50, early_stopping=True,
|
75 |
+
no_repeat_ngram_size=2)
|
76 |
+
text_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
|
77 |
+
print(text_output[0])
|
78 |
+
```
|
79 |
+
<br><br>
|
80 |
+
|
81 |
+
|
82 |
+
## Environmental Impact
|
83 |
+
|
84 |
+
Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO<sub>2</sub>eq/kWh. A cumulative of 5 hours of computation was performed on hardware of type RTX 3090Ti (TDP of 450W).
|
85 |
+
|
86 |
+
Total emissions are estimated to be 0.97 kgCO<sub>2</sub>eq of which 0 percents were directly offset.
|
87 |
+
|
88 |
+
Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
89 |
+
|
90 |
+
- **Hardware Type:** RTX 3090Ti
|
91 |
+
- **Hours used:** 5
|
92 |
+
- **Carbon Emitted:** 0.97 kgCO<sub>2</sub>eq
|