Converting from mistral-finetune to hf compatible

#79
by azimjon - opened

I couldn't find a script to convert from the result of mistral-finetune library to transformers library compatible weights. I would appreciate if someone shares it.

Here ya go:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/convert_mistral_weights_to_hf.py

Example calling it from Python:

Converting to hf

convert_command = ['python3', 'transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py', '--input_dir', model_save_path, '--model_size', '13B', '--is_v3', '--output_dir', f'{model_save_path}-hf']
with open(f"{Path.home()}/convert.log", "w") as f:
    hf_convert_result = subprocess.Popen(convert_command, stdout=subprocess.PIPE,
                                        cwd=f'{Path.home()}/')
    for c in iter(lambda: hf_convert_result.stdout.read(1), b""):
        sys.stdout.buffer.write(c)
        f.buffer.write(c)

@erik-akert but this is for mistral 7B model. I couldn't make it work for Nemo model.

Gentle ping on this. Converting Nemo model fails with the following

Traceback (most recent call last):
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 276, in <module>
    main()
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 271, in main
    convert_and_write_model(args.input_dir, args.output_dir, args.max_position_embeddings, args.modules_are_split)
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 217, in convert_and_write_model
    new_dict = convert_state_dict(original_state_dict, config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 103, in convert_state_dict
    tensor = tensor.view(num_key_value_heads, dims_per_head, dim).reshape(key_value_dim, dim)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[8, 160, 5120]' is invalid for input of size 5242880

Any guidance on how to fix this?

Sign up or log in to comment