Incomplete example?

by habsanero - opened Jan 9, 2024

Jan 9, 2024

I'm having trouble getting this to run. I've downloaded the weights from hugging face and the create_model() function finds them ok. But it runs into issues after that. I'm on an M1 Mac Pro with 32GB of RAM.

Here is my output using the the example file, with a small modification to get it to compile (weights_path needed to be changed to weights):

Loading weights from ../models/e5-mistral-7b-instruct/weights.npz
[WARNING] Missing keys in weights file: output.weight
[WARNING] Missing key output.weight in weights file
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/opt/homebrew/lib/python3.11/site-packages/numpy/linalg/linalg.py:2583: RuntimeWarning: overflow encountered in reduce
  return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

riccardomusmeci

MLX Community org Jan 9, 2024

What's the input?

habsanero

Jan 9, 2024

I was verbatim running the example in the README. I saved that out to a .py file and ran it.

habsanero

Jan 9, 2024

I think the example is incomplete/incorrect currently. For example, it doesn't use the last_token_pool method provided by the e5-mistral-7b-instruct team, which grabs only the last tensor to get the most useful embedding. It also doesn't seem to use the attention mask for anything, which is an important step.

riccardomusmeci

MLX Community org Jan 10, 2024

My initial idea was to give a grasp of what you can do. Coming back soon with a fix. Thanks for the comment <3

paulmaksimovich

Jan 12, 2024

•

edited Jan 12, 2024

As a fyi, I have some success with:

from mlx_llm.model import create_model
from transformers import AutoTokenizer
import mlx.core as mx

model = create_model("e5-mistral-7b-instruct")
tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
text = ["I like to play basketball", "I like to play tennis"]
tokens = tokenizer(text)
x = mx.array(tokens["input_ids"])
embeds = model.embed(x)
print(embeds)

Still need to do the whole mlx_llm install of course.

The output I see:

Loading weights from /Users/xxxxx/.cache/huggingface/hub/models--mlx-community--e5-mistral-7b-instruct-mlx/snapshots/22ac7676256b54fc97423032f2d9f26e9a8a92ea/weights.npz
[WARNING] Missing keys in weights file: output.weight
[WARNING] Missing key output.weight in weights file
array([[[-1.72168, 0.583008, -1.65137, ..., 0.221558, -2.03125, 2.67578],
        [-1.22266, -0.729004, -0.797363, ..., 3.53711, -4.75, 2.02148],
        [-0.523438, -0.513672, -0.918457, ..., 0.196411, 2.79297, -2.21094],
        [0.465332, 0.130127, 2.82422, ..., -5.42969, 0.048645, -2.75195],
        [3.10547, 1.83008, -1.22461, ..., -1.96582, -1.21875, 9.67188],
        [-1.2168, -2.43164, 0.737793, ..., 0.120483, 0.842773, 8.5625]],
       [[-1.72168, 0.583008, -1.65137, ..., 0.221558, -2.03125, 2.67578],
        [-1.22266, -0.729004, -0.797363, ..., 3.53711, -4.75, 2.02148],
        [-0.523438, -0.513672, -0.918457, ..., 0.196411, 2.79297, -2.21094],
        [0.465332, 0.130127, 2.82422, ..., -5.42969, 0.048645, -2.75195],
        [3.10547, 1.83008, -1.22461, ..., -1.96582, -1.21875, 9.67188],
        [-1.24805, 5.57812, -1.57715, ..., 3.67188, 4.06641, 6.41016]]], dtype=float16)

Process finished with exit code 0

^ shape [2,6,4096]
Although clearly missing weights/output sounds alarming.
Any tips?

riccardomusmeci

MLX Community org Jan 12, 2024

The missing weights and outputs is not alarming. This is an embedding model that does not have the head layers used to predict the next tokens usually. Here you want to stop before the generation and extract the embeddings.

riccardomusmeci

MLX Community org Jan 12, 2024

•

edited Jan 12, 2024

I'm having trouble getting this to run. I've downloaded the weights from hugging face and the create_model() function finds them ok. But it runs into issues after that. I'm on an M1 Mac Pro with 32GB of RAM.

Here is my output using the the example file, with a small modification to get it to compile (weights_path needed to be changed to weights):
Loading weights from ../models/e5-mistral-7b-instruct/weights.npz
[WARNING] Missing keys in weights file: output.weight
[WARNING] Missing key output.weight in weights file
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/opt/homebrew/lib/python3.11/site-packages/numpy/linalg/linalg.py:2583: RuntimeWarning: overflow encountered in reduce
  return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

@habsanero found where the problem is. You must call model.embed(x, norm=False)

Actually, I will change how the method works in the future since norm=True is the default and generates wrong embeddings.

riccardomusmeci changed discussion status to closed Jan 12, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment