lapp0 commited on
Commit
48d0fba
1 Parent(s): d57472c

End of training

Browse files
README.md CHANGED
@@ -107,28 +107,6 @@ LlamaForCausalLM(
107
  (self_attn): LlamaSdpaAttention(
108
  (q_proj): Linear(in_features=576, out_features=576, bias=False)
109
  (k_proj): Linear(in_features=576, out_features=192, bias=False)
110
- @@ -10,17 +10,16 @@
111
- (o_proj): Linear(in_features=576, out_features=576, bias=False)
112
- (rotary_emb): LlamaRotaryEmbedding()
113
- )
114
- - (mlp): LlamaMLP(
115
- + (mlp): LigerSwiGLUMLP(
116
- (gate_proj): Linear(in_features=576, out_features=1536, bias=False)
117
- (up_proj): Linear(in_features=576, out_features=1536, bias=False)
118
- (down_proj): Linear(in_features=1536, out_features=576, bias=False)
119
- - (act_fn): SiLU()
120
- )
121
- - (input_layernorm): LlamaRMSNorm((576,), eps=1e-05)
122
- - (post_attention_layernorm): LlamaRMSNorm((576,), eps=1e-05)
123
- + (input_layernorm): LigerRMSNorm((576,), eps=1e-05, offset=0.0)
124
- + (post_attention_layernorm): LigerRMSNorm((576,), eps=1e-05, offset=0.0)
125
- )
126
- )
127
- - (norm): LlamaRMSNorm((576,), eps=1e-05)
128
- + (norm): LigerRMSNorm((576,), eps=1e-05, offset=0.0)
129
- (rotary_emb): LlamaRotaryEmbedding()
130
- )
131
- (lm_head): Linear(in_features=576, out_features=49152, bias=False)
132
 
133
  ```
134
 
@@ -136,7 +114,7 @@ LlamaForCausalLM(
136
  <br/>
137
 
138
  # Train Dataset
139
- Trained on 44,060,170 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
140
 
141
  - Num Samples: `49,900`
142
  - Subset: `20231101.en`
@@ -185,7 +163,7 @@ The following hyperparameters were used during training:
185
  weight=0
186
  )
187
  )`
188
- - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7c610d513ac0>`
189
  - student_model_name_or_path: `None`
190
  - student_config_name_or_path: `None`
191
  - student_model_config: `{'num_hidden_layers': 15}`
 
107
  (self_attn): LlamaSdpaAttention(
108
  (q_proj): Linear(in_features=576, out_features=576, bias=False)
109
  (k_proj): Linear(in_features=576, out_features=192, bias=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
  ```
112
 
 
114
  <br/>
115
 
116
  # Train Dataset
117
+ Trained on 44,061,015 tokens from the [wikimedia/wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) dataset.
118
 
119
  - Num Samples: `49,900`
120
  - Subset: `20231101.en`
 
163
  weight=0
164
  )
165
  )`
166
+ - lr_scheduler: `<torch.optim.lr_scheduler.LambdaLR object at 0x7c6117e3aad0>`
167
  - student_model_name_or_path: `None`
168
  - student_config_name_or_path: `None`
169
  - student_model_config: `{'num_hidden_layers': 15}`
logs/attn_weight=0, bf16=True, per_device_train_batch_size=4, run_name=bf16/events.out.tfevents.1726170813.1c1a426a2fee ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89b005923b155722294ecdd76a75adae3712e0b8944137742eb912fba4f3e226
3
+ size 249