Update README.md
Browse files
README.md
CHANGED
@@ -71,6 +71,38 @@ Step Training Loss
|
|
71 |
37 0.686400
|
72 |
38 0.724200
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
75 |
|
76 |
# **JAIS Adapted 7B Chat Merged with V4 LORA adapters on Google Colab via: **
|
|
|
71 |
37 0.686400
|
72 |
38 0.724200
|
73 |
|
74 |
+
|
75 |
+
Merged model looks like the following (printing trainable parameters; pytorch) readout:
|
76 |
+
'''
|
77 |
+
LlamaForCausalLM(
|
78 |
+
(model): LlamaModel(
|
79 |
+
(embed_tokens): Embedding(64000, 4096)
|
80 |
+
(layers): ModuleList(
|
81 |
+
(0-31): 32 x LlamaDecoderLayer(
|
82 |
+
(self_attn): LlamaSdpaAttention(
|
83 |
+
(q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
84 |
+
(k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
85 |
+
(v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
86 |
+
(o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
|
87 |
+
(rotary_emb): LlamaRotaryEmbedding()
|
88 |
+
)
|
89 |
+
(mlp): LlamaMLP(
|
90 |
+
(gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
|
91 |
+
(up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
|
92 |
+
(down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
|
93 |
+
(act_fn): SiLU()
|
94 |
+
)
|
95 |
+
(input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
|
96 |
+
(post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
|
97 |
+
)
|
98 |
+
)
|
99 |
+
(norm): LlamaRMSNorm((4096,), eps=1e-05)
|
100 |
+
(rotary_emb): LlamaRotaryEmbedding()
|
101 |
+
)
|
102 |
+
(lm_head): Linear(in_features=4096, out_features=64000, bias=False)
|
103 |
+
)
|
104 |
+
'''
|
105 |
+
|
106 |
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
107 |
|
108 |
# **JAIS Adapted 7B Chat Merged with V4 LORA adapters on Google Colab via: **
|