Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -74,33 +74,61 @@ Step	Training Loss
 Merged model looks like the following (printing trainable parameters; pytorch) readout:
 '''
 LlamaForCausalLM(
   (model): LlamaModel(
     (embed_tokens): Embedding(64000, 4096)
     (layers): ModuleList(
       (0-31): 32 x LlamaDecoderLayer(
         (self_attn): LlamaSdpaAttention(
           (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (rotary_emb): LlamaRotaryEmbedding()
         )
         (mlp): LlamaMLP(
           (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
           (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
           (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
           (act_fn): SiLU()
         )
         (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
         (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
       )
     )
     (norm): LlamaRMSNorm((4096,), eps=1e-05)
     (rotary_emb): LlamaRotaryEmbedding()
   )
   (lm_head): Linear(in_features=4096, out_features=64000, bias=False)
 )
 '''
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

 Merged model looks like the following (printing trainable parameters; pytorch) readout:
 '''
 LlamaForCausalLM(
   (model): LlamaModel(
     (embed_tokens): Embedding(64000, 4096)
     (layers): ModuleList(
       (0-31): 32 x LlamaDecoderLayer(
         (self_attn): LlamaSdpaAttention(
           (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
           (rotary_emb): LlamaRotaryEmbedding()
         )
         (mlp): LlamaMLP(
           (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
           (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
           (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
           (act_fn): SiLU()
         )
         (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
         (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
       )
     )
     (norm): LlamaRMSNorm((4096,), eps=1e-05)
     (rotary_emb): LlamaRotaryEmbedding()
   )
   (lm_head): Linear(in_features=4096, out_features=64000, bias=False)
 )
 '''
 This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.