Incomplete example?
I'm having trouble getting this to run. I've downloaded the weights from hugging face and the create_model() function finds them ok. But it runs into issues after that. I'm on an M1 Mac Pro with 32GB of RAM.
Here is my output using the the example file, with a small modification to get it to compile (weights_path needed to be changed to weights):
Loading weights from ../models/e5-mistral-7b-instruct/weights.npz
[WARNING] Missing keys in weights file: output.weight
[WARNING] Missing key output.weight in weights file
You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/opt/homebrew/lib/python3.11/site-packages/numpy/linalg/linalg.py:2583: RuntimeWarning: overflow encountered in reduce
return sqrt(add.reduce(s, axis=axis, keepdims=keepdims))
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
What's the input?
I was verbatim running the example in the README. I saved that out to a .py file and ran it.
I think the example is incomplete/incorrect currently. For example, it doesn't use the last_token_pool method provided by the e5-mistral-7b-instruct team, which grabs only the last tensor to get the most useful embedding. It also doesn't seem to use the attention mask for anything, which is an important step.
My initial idea was to give a grasp of what you can do. Coming back soon with a fix. Thanks for the comment <3
As a fyi, I have some success with:
from mlx_llm.model import create_model
from transformers import AutoTokenizer
import mlx.core as mx
model = create_model("e5-mistral-7b-instruct")
tokenizer = AutoTokenizer.from_pretrained('intfloat/e5-mistral-7b-instruct')
text = ["I like to play basketball", "I like to play tennis"]
tokens = tokenizer(text)
x = mx.array(tokens["input_ids"])
embeds = model.embed(x)
print(embeds)
Still need to do the whole mlx_llm install of course.
The output I see:
Loading weights from /Users/xxxxx/.cache/huggingface/hub/models--mlx-community--e5-mistral-7b-instruct-mlx/snapshots/22ac7676256b54fc97423032f2d9f26e9a8a92ea/weights.npz
[WARNING] Missing keys in weights file: output.weight
[WARNING] Missing key output.weight in weights file
array([[[-1.72168, 0.583008, -1.65137, ..., 0.221558, -2.03125, 2.67578],
[-1.22266, -0.729004, -0.797363, ..., 3.53711, -4.75, 2.02148],
[-0.523438, -0.513672, -0.918457, ..., 0.196411, 2.79297, -2.21094],
[0.465332, 0.130127, 2.82422, ..., -5.42969, 0.048645, -2.75195],
[3.10547, 1.83008, -1.22461, ..., -1.96582, -1.21875, 9.67188],
[-1.2168, -2.43164, 0.737793, ..., 0.120483, 0.842773, 8.5625]],
[[-1.72168, 0.583008, -1.65137, ..., 0.221558, -2.03125, 2.67578],
[-1.22266, -0.729004, -0.797363, ..., 3.53711, -4.75, 2.02148],
[-0.523438, -0.513672, -0.918457, ..., 0.196411, 2.79297, -2.21094],
[0.465332, 0.130127, 2.82422, ..., -5.42969, 0.048645, -2.75195],
[3.10547, 1.83008, -1.22461, ..., -1.96582, -1.21875, 9.67188],
[-1.24805, 5.57812, -1.57715, ..., 3.67188, 4.06641, 6.41016]]], dtype=float16)
Process finished with exit code 0
^ shape [2,6,4096]
Although clearly missing weights/output sounds alarming.
Any tips?
The missing weights and outputs is not alarming. This is an embedding model that does not have the head layers used to predict the next tokens usually. Here you want to stop before the generation and extract the embeddings.
I'm having trouble getting this to run. I've downloaded the weights from hugging face and the create_model() function finds them ok. But it runs into issues after that. I'm on an M1 Mac Pro with 32GB of RAM.
Here is my output using the the example file, with a small modification to get it to compile (weights_path needed to be changed to weights):
Loading weights from ../models/e5-mistral-7b-instruct/weights.npz [WARNING] Missing keys in weights file: output.weight [WARNING] Missing key output.weight in weights file You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding. /opt/homebrew/lib/python3.11/site-packages/numpy/linalg/linalg.py:2583: RuntimeWarning: overflow encountered in reduce return sqrt(add.reduce(s, axis=axis, keepdims=keepdims)) [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
@habsanero found where the problem is. You must call model.embed(x, norm=False)
Actually, I will change how the method works in the future since norm=True is the default and generates wrong embeddings.