Adding `safetensors` variant of this model

#21

by SFconvertbot - opened Feb 25

base: refs/heads/main

←

from: refs/pr/21

Discussion Files changed

+304

-0

SFconvertbot

Feb 25

This is an automated PR created with https://huggingface.co/spaces/safetensors/convert

This new file is equivalent to pytorch_model.bin but safe in the sense that
no arbitrary code can be put into it.

These files also happen to load much faster than their pytorch counterpart:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/safetensors_doc/en/speed.ipynb

The widgets on your model page will run using this model even if this is not merged
making sure the file actually works.

If you find any issues: please report here: https://huggingface.co/spaces/safetensors/convert/discussions

Feel free to ignore this PR.

Adding `safetensors` variant of this modelbf241c5b

Suparious

Cognitive Computations org Feb 26

•

edited Feb 26

Kind of scary to see this as being automated.
no way to trust that these new files are not manipulated.
Also, no reason to put duplicate content in a source repo.

julien-c

Feb 27

hi @Suparious ! this is from the official bot. You can run the conversion script yourself and check that the output is identical.

Note that safetensors files are inherently safer than .bin

Of course it's up to you, feel free to not merge if you prefer!

Narsil

Feb 27

| no way to trust that these new files are not manipulated.

You can easily load both to check for consistency:

from huggingface_hub import hf_hub_download
import torch
from safetensors.torch import load_file


for i in range(1, 3):
    pt_filename = f"pytorch_model-0000{i}-of-00002.bin"
    sf_filename = f"model-0000{i}-of-00002.safetensors"

    pt_filename = hf_hub_download("cognitivecomputations/dolphin-2.1-mistral-7b", filename=pt_filename)
    sf_filename = hf_hub_download("cognitivecomputations/dolphin-2.1-mistral-7b", filename=sf_filename, revision="refs/pr/21")

    original = torch.load(pt_filename)
    new = load_file(sf_filename)

    for k, v in original.items():
        torch.testing.assert_close(v, new[k])
        print(f"{k} is OK")

| no reason to put duplicate content in a source repo.

I'm not sure you understand the seriousness of what pickle files enable an attacker to do on your machine.
And what an attacker could do if he got hold of anyone with write credentials in your company by modifying those weight files to everyone that is using your models.
This is not a necessary vector, it's as simple to moving to safetensors, really. (Plus the speed benefits).

Internally at HF, we're mostly banning the presence of pickle files in any production system just to give you an idea.

Suparious

Cognitive Computations org Feb 28

•

edited Feb 28

from huggingface_hub import hf_hub_download
import torch
from safetensors.torch import load_file


for i in range(1, 3):
    pt_filename = f"pytorch_model-0000{i}-of-00002.bin"
    sf_filename = f"model-0000{i}-of-00002.safetensors"

    pt_filename = hf_hub_download("cognitivecomputations/dolphin-2.1-mistral-7b", filename=pt_filename)
    sf_filename = hf_hub_download("cognitivecomputations/dolphin-2.1-mistral-7b", filename=sf_filename, revision="refs/pr/21")

    original = torch.load(pt_filename)
    new = load_file(sf_filename)

    for k, v in original.items():
        torch.testing.assert_close(v, new[k])
        print(f"{k} is OK")

Thank-you for this example.

| no reason to put duplicate content in a source repo.

I'm not sure you understand the seriousness of what pickle files enable an attacker to do on your machine.

Yes, this was 100% accurate. Thank-you for your patience and understanding.

Internally at HF, we're mostly banning the presence of pickle files in any production system just to give you an idea.

We appreciate your detailed explanation with thoughtful reasoning. We will have to bring this up internally.

Suparious

Cognitive Computations org Feb 28

@ehartford - This safetensor file appears to match the pickle version. I'll leave it to you, if you want to merge.

ehartford changed pull request status to merged Feb 28

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment