RichardErkhov commited on
Commit
0861cf7
·
verified ·
1 Parent(s): c5a7792

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ MagicPrompt-tinystories-33M-epoch10-merged - bnb 8bits
11
+ - Model creator: https://huggingface.co/Technotech/
12
+ - Original model: https://huggingface.co/Technotech/MagicPrompt-tinystories-33M-epoch10-merged/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ library_name: transformers
20
+ license: apache-2.0
21
+ datasets:
22
+ - Gustavosta/Stable-Diffusion-Prompts
23
+ language:
24
+ - en
25
+ tags:
26
+ - completion
27
+ widget:
28
+ - text: A picture of
29
+ - text: photo of
30
+ - text: a drawing of
31
+ inference:
32
+ parameters:
33
+ max_new_tokens: 20
34
+ do_sample: True
35
+ early_stopping: True
36
+ temperature: 1.2
37
+ num_beams: 5
38
+ no_repeat_ngram_size: 2
39
+ repetition_penalty: 1.35
40
+ top_k: 50
41
+ top_p: 0.75
42
+ ---
43
+ # MagicPrompt TinyStories-33M (Merged)
44
+
45
+ ## Info
46
+
47
+ Magic prompt completion model trained on a dataset of 80k Stable Diffusion prompts. Base model: TinyStories-33M. Inspired by [MagicPrompt-Stable-Diffusion](https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion).
48
+
49
+ Model seems to be pretty decent for 33M params due to the TinyStories base, but it clearly lacks much of an understanding of pretty much anything. Still, considering the size, I think it's decent. Whether you would use this over a small GPT-2 based model is up to you.
50
+
51
+ ## Examples
52
+
53
+ Best generation settings I found: `max_new_tokens=40, do_sample=True, temperature=1.2, num_beams=10, no_repeat_ngram_size=2, early_stopping=True, repetition_penalty=1.35, top_k=50, top_p=0.55, eos_token_id=tokenizer.eos_token_id, pad_token_id=0` (there may be better settings).
54
+ `no_repeat_ngram_size` is important for making sure the model doesn't repeat phrases (as it is quite small).
55
+
56
+ (Bold text is generated by the model)
57
+
58
+ "found footage of a ufo **in the forest, by lusax, wlop, greg rutkowski, stanley artgerm, highly detailed, intricate, digital painting, artstation, concept art, smooth**"
59
+
60
+ "A close shot of a bird in a jungle, **with two legs, with long hair on a tall, long brown body, long white skin, sharp teeth, high bones, digital painting, artstation, concept art, illustration by wlop,**"
61
+
62
+ "Camera shot of **a strange young girl wearing a cloak, wearing a mask in clothes, with long curly hair, long hair, black eyes, dark skin, white teeth, long brown eyes eyes, big eyes, sharp**"
63
+
64
+ "An illustration of a house, stormy weather, **sun, moonlight, night, concept art, 4 k, wlop, by wlop, by jose stanley, ilya kuvshinov, sprig**"
65
+
66
+ "A field of flowers, camera shot, 70mm lens, **fantasy, intricate, highly detailed, artstation, concept art, sharp focus, illustration, illustration, artgerm jake daggaws, artgerm and jaggodieie brad**"
67
+
68
+ ## Next steps
69
+
70
+ - Larger dataset ie [neuralworm/stable-diffusion-discord-prompts](https://huggingface.co/datasets/neuralworm/stable-diffusion-discord-prompts) or [daspartho/stable-diffusion-prompts](https://huggingface.co/datasets/daspartho/stable-diffusion-prompts)
71
+ - More epochs
72
+ - Instead of going smaller than GPT-2 137M, fine tune a 1-7B param model
73
+
74
+ ## Training config
75
+
76
+ - Rank 16 LoRA
77
+ - Trained on Gustavosta/Stable-Diffusion-Prompts for 10 epochs
78
+ - Batch size of 64
79
+
80
+ ## Training procedure
81
+
82
+
83
+ The following `bitsandbytes` quantization config was used during training:
84
+ - load_in_8bit: False
85
+ - load_in_4bit: True
86
+ - llm_int8_threshold: 6.0
87
+ - llm_int8_skip_modules: None
88
+ - llm_int8_enable_fp32_cpu_offload: False
89
+ - llm_int8_has_fp16_weight: False
90
+ - bnb_4bit_quant_type: fp4
91
+ - bnb_4bit_use_double_quant: False
92
+ - bnb_4bit_compute_dtype: float32
93
+ ### Framework versions
94
+
95
+
96
+ - PEFT 0.5.0.dev0
97
+