Converting from mistral-finetune to hf compatible

#79

by azimjon - opened Sep 13, 2024

Sep 13, 2024

I couldn't find a script to convert from the result of mistral-finetune library to transformers library compatible weights. I would appreciate if someone shares it.

erik-akert

Nov 8, 2024

•

edited Nov 8, 2024

Here ya go:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/convert_mistral_weights_to_hf.py

Example calling it from Python:

Converting to hf

convert_command = ['python3', 'transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py', '--input_dir', model_save_path, '--model_size', '13B', '--is_v3', '--output_dir', f'{model_save_path}-hf']
with open(f"{Path.home()}/convert.log", "w") as f:
    hf_convert_result = subprocess.Popen(convert_command, stdout=subprocess.PIPE,
                                        cwd=f'{Path.home()}/')
    for c in iter(lambda: hf_convert_result.stdout.read(1), b""):
        sys.stdout.buffer.write(c)
        f.buffer.write(c)

azimjon

Nov 9, 2024

@erik-akert but this is for mistral 7B model. I couldn't make it work for Nemo model.

mkserge

Jan 14

Gentle ping on this. Converting Nemo model fails with the following

Traceback (most recent call last):
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 276, in <module>
    main()
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 271, in main
    convert_and_write_model(args.input_dir, args.output_dir, args.max_position_embeddings, args.modules_are_split)
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 217, in convert_and_write_model
    new_dict = convert_state_dict(original_state_dict, config)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ec2-user/workspace/transformers/src/transformers/models/mistral/convert_mistral_weights_to_hf.py", line 103, in convert_state_dict
    tensor = tensor.view(num_key_value_heads, dims_per_head, dim).reshape(key_value_dim, dim)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[8, 160, 5120]' is invalid for input of size 5242880

Any guidance on how to fix this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment