NeMo
nvidia

Hf safetensors version

#3
by ehartford - opened

Who is making the hf safetensors version?

I've never used an nvidia model. Any idea on how to convert them to hf st?

What I could find:

  • the MLP has fc1 and fc2 (presumably up_proj and down_proj in any order, no gate_proj), so conversion to Llama is already excluded
  • the normalization layers have bias and is layernorm1p (also excludes any conversion to Llama format)
  • this model is GQA (96 query heads, 8 KV heads)
  • the activation function is squared ReLU

with all of that said, writing a modeling file seems inevitable unless we can find an existing Transformers architecture that matches all of these characteristics...

Also rotary_pct (0.5 here) needs to be implemented (see GPT-NeoX for reference):

        self.rotary_ndims = int(self.head_dim * config.rope_pct)
        ...

        query_rot = query_states[..., : self.rotary_ndims]
        query_pass = query_states[..., self.rotary_ndims :]
        key_rot = key_states[..., : self.rotary_ndims]
        key_pass = key_states[..., self.rotary_ndims :]

        cos, sin = self.rotary_emb(value_states, position_ids)
        query_states, key_states = apply_rotary_pos_emb(query_rot, key_rot, cos, sin)

        query_states = torch.cat((query_states, query_pass), dim=-1)
        key_states = torch.cat((key_states, key_pass), dim=-1)

and it's pre-layernorm instead of pre and post like llama.

Closest arch may be Phi-3. I'm unsure.

My checkpoint after finetune with Nemo framework look like this checkpoint (but I don't have model_config.yaml or .model file, only model_weights). How can I convert this to hf safetensors format?

FailSpy has an effort but it seems to have stalled

Sign up or log in to comment