Hf safetensors version
#3
by
ehartford
- opened
Who is making the hf safetensors version?
I've never used an nvidia model. Any idea on how to convert them to hf st?
What I could find:
- the MLP has fc1 and fc2 (presumably up_proj and down_proj in any order, no gate_proj), so conversion to Llama is already excluded
- the normalization layers have bias and is
layernorm1p
(also excludes any conversion to Llama format) - this model is GQA (96 query heads, 8 KV heads)
- the activation function is squared ReLU
with all of that said, writing a modeling file seems inevitable unless we can find an existing Transformers architecture that matches all of these characteristics...
Also rotary_pct (0.5
here) needs to be implemented (see GPT-NeoX for reference):
self.rotary_ndims = int(self.head_dim * config.rope_pct)
...
query_rot = query_states[..., : self.rotary_ndims]
query_pass = query_states[..., self.rotary_ndims :]
key_rot = key_states[..., : self.rotary_ndims]
key_pass = key_states[..., self.rotary_ndims :]
cos, sin = self.rotary_emb(value_states, position_ids)
query_states, key_states = apply_rotary_pos_emb(query_rot, key_rot, cos, sin)
query_states = torch.cat((query_states, query_pass), dim=-1)
key_states = torch.cat((key_states, key_pass), dim=-1)
and it's pre-layernorm instead of pre and post like llama.
Closest arch may be Phi-3. I'm unsure.
My checkpoint after finetune with Nemo framework look like this checkpoint (but I don't have model_config.yaml or .model file, only model_weights). How can I convert this to hf safetensors format?
FailSpy has an effort but it seems to have stalled