Solshine commited on
Commit
0878ec9
·
verified ·
1 Parent(s): f9bee11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md CHANGED
@@ -71,6 +71,38 @@ Step Training Loss
71
  37 0.686400
72
  38 0.724200
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
75
 
76
  # **JAIS Adapted 7B Chat Merged with V4 LORA adapters on Google Colab via: **
 
71
  37 0.686400
72
  38 0.724200
73
 
74
+
75
+ Merged model looks like the following (printing trainable parameters; pytorch) readout:
76
+ '''
77
+ LlamaForCausalLM(
78
+ (model): LlamaModel(
79
+ (embed_tokens): Embedding(64000, 4096)
80
+ (layers): ModuleList(
81
+ (0-31): 32 x LlamaDecoderLayer(
82
+ (self_attn): LlamaSdpaAttention(
83
+ (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
84
+ (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
85
+ (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
86
+ (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
87
+ (rotary_emb): LlamaRotaryEmbedding()
88
+ )
89
+ (mlp): LlamaMLP(
90
+ (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
91
+ (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
92
+ (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
93
+ (act_fn): SiLU()
94
+ )
95
+ (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
96
+ (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
97
+ )
98
+ )
99
+ (norm): LlamaRMSNorm((4096,), eps=1e-05)
100
+ (rotary_emb): LlamaRotaryEmbedding()
101
+ )
102
+ (lm_head): Linear(in_features=4096, out_features=64000, bias=False)
103
+ )
104
+ '''
105
+
106
  This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
107
 
108
  # **JAIS Adapted 7B Chat Merged with V4 LORA adapters on Google Colab via: **