Solshine commited on
Commit
c193b74
·
verified ·
1 Parent(s): 0878ec9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -74,33 +74,61 @@ Step Training Loss
74
 
75
  Merged model looks like the following (printing trainable parameters; pytorch) readout:
76
  '''
 
77
  LlamaForCausalLM(
 
78
  (model): LlamaModel(
 
79
  (embed_tokens): Embedding(64000, 4096)
 
80
  (layers): ModuleList(
 
81
  (0-31): 32 x LlamaDecoderLayer(
 
82
  (self_attn): LlamaSdpaAttention(
 
83
  (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
 
84
  (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
 
85
  (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
 
86
  (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
 
87
  (rotary_emb): LlamaRotaryEmbedding()
 
88
  )
 
89
  (mlp): LlamaMLP(
 
90
  (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
 
91
  (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
 
92
  (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
 
93
  (act_fn): SiLU()
 
94
  )
 
95
  (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
 
96
  (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
 
97
  )
 
98
  )
 
99
  (norm): LlamaRMSNorm((4096,), eps=1e-05)
 
100
  (rotary_emb): LlamaRotaryEmbedding()
 
101
  )
 
102
  (lm_head): Linear(in_features=4096, out_features=64000, bias=False)
 
103
  )
 
104
  '''
105
 
106
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
74
 
75
  Merged model looks like the following (printing trainable parameters; pytorch) readout:
76
  '''
77
+
78
  LlamaForCausalLM(
79
+
80
  (model): LlamaModel(
81
+
82
  (embed_tokens): Embedding(64000, 4096)
83
+
84
  (layers): ModuleList(
85
+
86
  (0-31): 32 x LlamaDecoderLayer(
87
+
88
  (self_attn): LlamaSdpaAttention(
89
+
90
  (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
91
+
92
  (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
93
+
94
  (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
95
+
96
  (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
97
+
98
  (rotary_emb): LlamaRotaryEmbedding()
99
+
100
  )
101
+
102
  (mlp): LlamaMLP(
103
+
104
  (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
105
+
106
  (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
107
+
108
  (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
109
+
110
  (act_fn): SiLU()
111
+
112
  )
113
+
114
  (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
115
+
116
  (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
117
+
118
  )
119
+
120
  )
121
+
122
  (norm): LlamaRMSNorm((4096,), eps=1e-05)
123
+
124
  (rotary_emb): LlamaRotaryEmbedding()
125
+
126
  )
127
+
128
  (lm_head): Linear(in_features=4096, out_features=64000, bias=False)
129
+
130
  )
131
+
132
  '''
133
 
134
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.