Update README.md
Browse files
README.md
CHANGED
@@ -74,33 +74,61 @@ Step Training Loss
|
|
74 |
|
75 |
Merged model looks like the following (printing trainable parameters; pytorch) readout:
|
76 |
'''
|
|
|
77 |
LlamaForCausalLM(
|
|
|
78 |
(model): LlamaModel(
|
|
|
79 |
(embed_tokens): Embedding(64000, 4096)
|
|
|
80 |
(layers): ModuleList(
|
|
|
81 |
(0-31): 32 x LlamaDecoderLayer(
|
|
|
82 |
(self_attn): LlamaSdpaAttention(
|
|
|
83 |
(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
|
|
84 |
(k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
|
|
85 |
(v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
|
|
86 |
(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
|
|
87 |
(rotary_emb): LlamaRotaryEmbedding()
|
|
|
88 |
)
|
|
|
89 |
(mlp): LlamaMLP(
|
|
|
90 |
(gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
|
|
|
91 |
(up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
|
|
|
92 |
(down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
|
|
|
93 |
(act_fn): SiLU()
|
|
|
94 |
)
|
|
|
95 |
(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
|
|
|
96 |
(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
|
|
|
97 |
)
|
|
|
98 |
)
|
|
|
99 |
(norm): LlamaRMSNorm((4096,), eps=1e-05)
|
|
|
100 |
(rotary_emb): LlamaRotaryEmbedding()
|
|
|
101 |
)
|
|
|
102 |
(lm_head): Linear(in_features=4096, out_features=64000, bias=False)
|
|
|
103 |
)
|
|
|
104 |
'''
|
105 |
|
106 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
|
|
74 |
|
75 |
Merged model looks like the following (printing trainable parameters; pytorch) readout:
|
76 |
'''
|
77 |
+
|
78 |
LlamaForCausalLM(
|
79 |
+
|
80 |
(model): LlamaModel(
|
81 |
+
|
82 |
(embed_tokens): Embedding(64000, 4096)
|
83 |
+
|
84 |
(layers): ModuleList(
|
85 |
+
|
86 |
(0-31): 32 x LlamaDecoderLayer(
|
87 |
+
|
88 |
(self_attn): LlamaSdpaAttention(
|
89 |
+
|
90 |
(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
91 |
+
|
92 |
(k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
93 |
+
|
94 |
(v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
95 |
+
|
96 |
(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
97 |
+
|
98 |
(rotary_emb): LlamaRotaryEmbedding()
|
99 |
+
|
100 |
)
|
101 |
+
|
102 |
(mlp): LlamaMLP(
|
103 |
+
|
104 |
(gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
|
105 |
+
|
106 |
(up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
|
107 |
+
|
108 |
(down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
|
109 |
+
|
110 |
(act_fn): SiLU()
|
111 |
+
|
112 |
)
|
113 |
+
|
114 |
(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
|
115 |
+
|
116 |
(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
|
117 |
+
|
118 |
)
|
119 |
+
|
120 |
)
|
121 |
+
|
122 |
(norm): LlamaRMSNorm((4096,), eps=1e-05)
|
123 |
+
|
124 |
(rotary_emb): LlamaRotaryEmbedding()
|
125 |
+
|
126 |
)
|
127 |
+
|
128 |
(lm_head): Linear(in_features=4096, out_features=64000, bias=False)
|
129 |
+
|
130 |
)
|
131 |
+
|
132 |
'''
|
133 |
|
134 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|