erwanf commited on
Commit
f12cc7e
1 Parent(s): ad8874c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - Skylion007/openwebtext
5
+ language:
6
+ - en
7
+ metrics:
8
+ - perplexity
9
+ pipeline_tag: text-generation
10
+ ---
11
+ # GPT-2 Mini
12
+
13
+ A smaller GPT-2 model with (only) 39M parameters. It was pretrained on a subset of OpenWebText, the open-source version of the pretraining dataset used by OpenAI for the original GPT-2 models.
14
+
15
+ ## Uses
16
+
17
+ The purpose of this model is mainly for research and education. Its small size allows for fast experiments in resource-limited settings, while still being able of generating complex and coherent text.
18
+
19
+ ## Getting Started
20
+
21
+ Use the code below to get started with the model:
22
+ ```py
23
+ from transformers import AutoModelForCausalLM, AutoTokenizer
24
+
25
+ # Load model
26
+ model = AutoModelForCausalLM.from_pretrained("erwanf/gpt2-mini")
27
+ model.eval()
28
+
29
+ # Load tokenizer
30
+ tokenizer = AutoTokenizer.from_pretrained("erwanf/gpt2-mini")
31
+
32
+ # Generate text
33
+ prompt = "Hello, I'm a language model,"
34
+ input_ids = tokenizer.encode(prompt, return_tensors="pt")
35
+
36
+ output = model.generate(input_ids, do_sample=True, max_length=50, num_return_sequences=5)
37
+ output_text = tokenizer.batch_decode(output, skip_special_tokens=True)
38
+ print(output_text)
39
+ ```
40
+
41
+ Output:
42
+ ```
43
+ ["Hello, I'm a language model, I can't be more efficient in words.\n\nYou can use this as a point to find out the next bit in your system, and learn more about me.\n\nI think a lot of the",
44
+ "Hello, I'm a language model, my teacher is a good teacher - a good school teacher – and one thing you have to remember:\n\nIt's not perfect. A school is not perfect; it isn't perfect at all!\n\n",
45
+ 'Hello, I\'m a language model, but if I can do something for you then go for it (for a word). Here is my blog, the language:\n\nI\'ve not used "normal" in English words, but I\'ve always',
46
+ 'Hello, I\'m a language model, I\'m talking to you the very first time I used a dictionary and it can be much better than one word in my dictionary. What would an "abnormal" English dictionary have to do with a dictionary and',
47
+ 'Hello, I\'m a language model, the most powerful representation of words and phrases in the language I\'m using."\n\nThe new rules change that makes it much harder for people to understand a language that does not have a native grammar (even with']
48
+ ```
49
+
50
+ ## Training Details
51
+
52
+ The architecture relies on the GPT-2 model, with smaller dimensions and less layers. It uses the same tokenizer as GPT-2. We used the first 2M rows from the OpenWebText dataset, out of which we use 1k for test and validation sets.
53
+
54
+ ### Hyperparameters
55
+
56
+ | **Hyperparameter** | **Value** |
57
+ |------------------------|------------------|
58
+ | **Model Parameters** | |
59
+ | Vocabulary Size | 50,257 |
60
+ | Context Length | 512 |
61
+ | Number of Layers | 4 |
62
+ | Hidden Size | 512 |
63
+ | Number of Attention Heads | 8 |
64
+ | Intermediate Size | 2048 |
65
+ | Activation Function | GELU |
66
+ | Dropout | No |
67
+ | **Training Parameters**| |
68
+ | Learning Rate | 5e-4 |
69
+ | Batch Size | 256 |
70
+ | Optimizer | AdamW |
71
+ | beta1 | 0.9 |
72
+ | beta2 | 0.98 |
73
+ | Weight Decay | 0.1 |
74
+ | Training Steps | 100,000 |
75
+ | Warmup Steps | 4,000 |
76
+ | Learning Rate Scheduler| Cosine |
77
+ | Training Dataset Size | 1M samples |
78
+ | Validation Dataset Size| 1k samples |
79
+ | Float Type | bf16 |
80
+