Transformers
English
Inference Endpoints
bjoernp commited on
Commit
6d413a8
·
1 Parent(s): 834746f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - yahma/alpaca-cleaned
5
+ - tatsu-lab/alpaca
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ pipeline_tag: text-generation
10
+ ---
11
+ # Model Card for Alpaca Cerebras-6.7B LoRA
12
+
13
+ This repository contains the adapter weights for the [Cerebras-6.7B](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) model finetuned on the
14
+ cleaned version of the alpaca dataset following [github.com/tloen/alpaca-lora](https://github.com/tloen/alpaca-lora). Find the code used
15
+ for finetuning at our fork: [github.com/bjoernpl/cerebras-lora](https://github.com/bjoernpl/cerebras-lora).
16
+
17
+ ## Model Details
18
+
19
+ ### Model Description
20
+
21
+ _Copied from [cerebras/Cerebras-GPT-6.7B](https://huggingface.co/cerebras/Cerebras-GPT-6.7B) model card:_
22
+
23
+ The Cerebras-GPT family is released to facilitate research into LLM scaling laws using open architectures and data sets and demonstrate the simplicity of and scalability of training LLMs on the Cerebras software and hardware stack. All Cerebras-GPT models are available on Hugging Face.
24
+
25
+ The family includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models.
26
+
27
+ All models in the Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws (20 tokens per model parameter) which is compute-optimal.
28
+
29
+ These models were trained on the Andromeda AI supercomputer comprised of 16 CS-2 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
30
+
31
+ Cerebras systems for pre-training and fine tuning are available in the cloud via the Cerebras Model Studio. Cerebras CS-2 compatible checkpoints are available in Cerebras Model Zoo.
32
+
33
+
34
+ * Developed by: [Cerebras Systems](https://www.cerebras.net/) finetuned by [Björn P.](https://github.com/bjoernpl).
35
+ * License: Apache 2.0
36
+ * Model type: Transformer-based Language Model
37
+ * Architecture: GPT-3 style architecture with LoRA adapter
38
+ * Data set: The Pile
39
+ * Tokenizer: Byte Pair Encoding
40
+ * Vocabulary Size: 50257
41
+ * Sequence Length: 2048
42
+ * Optimizer: AdamW, (β1, β2) = (0.9, 0.95), adam_eps = 1e−8 (1e−9 for larger models)
43
+ * Positional Encoding: Learned
44
+ * Language: English
45
+ * Learn more: Dense Scaling Laws Paper for training procedure, config files, and details on how to use.
46
+
47
+
48
+ ## Quickstart
49
+ See [github.com/bjoernpl/cerebras-lora](https://github.com/bjoernpl/cerebras-lora) for a Gradio demo and more code.
50
+
51
+ This model can be easily loaded using the AutoModelForCausalLM functionality:
52
+ ```python
53
+ from transformers import AutoTokenizer, AutoModelForCausalLM
54
+ tokenizer = AutoTokenizer.from_pretrained("cerebras/Cerebras-GPT-6.7B")
55
+ model = AutoModelForCausalLM.from_pretrained("cerebras/Cerebras-GPT-6.7B")
56
+ model = PeftModel.from_pretrained(model, "bjoernp/alpaca-cerebras-6.7B")
57
+ text = "Generative AI is "
58
+ ```
59
+
60
+ And can be used with Hugging Face Pipelines
61
+
62
+ ```python
63
+ from transformers import pipeline
64
+ pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
65
+ generated_text = pipe(text, max_length=50, do_sample=False, no_repeat_ngram_size=2)[0]
66
+ print(generated_text['generated_text'])
67
+ ```
68
+
69
+ or with `model.generate()`
70
+
71
+ ```python
72
+ inputs = tokenizer(text, return_tensors="pt")
73
+ outputs = model.generate(**inputs, num_beams=5,
74
+ max_new_tokens=50, early_stopping=True,
75
+ no_repeat_ngram_size=2)
76
+ text_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
77
+ print(text_output[0])
78
+ ```
79
+ <br><br>
80
+
81
+
82
+ ## Environmental Impact
83
+
84
+ Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO<sub>2</sub>eq/kWh. A cumulative of 5 hours of computation was performed on hardware of type RTX 3090Ti (TDP of 450W).
85
+
86
+ Total emissions are estimated to be 0.97 kgCO<sub>2</sub>eq of which 0 percents were directly offset.
87
+
88
+ Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
89
+
90
+ - **Hardware Type:** RTX 3090Ti
91
+ - **Hours used:** 5
92
+ - **Carbon Emitted:** 0.97 kgCO<sub>2</sub>eq