base_model: mistralai/Pixtral-12B-2409
language:
- en
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0
tags:
- multimodal
- mistral
- pixtral
- unsloth
Finetune Llama 3.2, Qwen 2.5, Gemma 2, Mistral 2-5x faster with 70% less memory via Unsloth!
We have a free Google Colab Tesla T4 notebook for Pixtral (12B) 2409 here: https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing
And a free notebook for Llama 3.2 Vision (11B) here
unsloth/Pixtral-12B-2409-bnb-4bit
For more details on the model, please go to Mistral's original model card
β¨ Finetune for Free
All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, vLLM or uploaded to Hugging Face.
Unsloth supports | Free Notebooks | Performance | Memory use |
---|---|---|---|
Llama-3.2 (3B) | βΆοΈ Start on Colab | 2.4x faster | 58% less |
Llama-3.2 (11B vision) | βΆοΈ Start on Colab | 2x faster | 40% less |
Qwen2 VL (7B) | βΆοΈ Start on Colab | 1.8x faster | 40% less |
Qwen2.5 (7B) | βΆοΈ Start on Colab | 2x faster | 60% less |
Llama-3.1 (8B) | βΆοΈ Start on Colab | 2.4x faster | 58% less |
Phi-3.5 (mini) | βΆοΈ Start on Colab | 2x faster | 50% less |
Gemma 2 (9B) | βΆοΈ Start on Colab | 2.4x faster | 58% less |
Mistral (7B) | βΆοΈ Start on Colab | 2.2x faster | 62% less |
DPO - Zephyr | βΆοΈ Start on Colab | 1.9x faster | 19% less |
- This conversational notebook is useful for ShareGPT ChatML / Vicuna templates.
- This text completion notebook is for raw text. This DPO notebook replicates Zephyr.
- * Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
Special Thanks
A huge thank you to the Mistral team for creating and releasing these models.
Pixtral 2409 Details
Mistral common has image support! You can now pass images and URLs alongside text into the user message.
pip install --upgrade mistral_common
To use the model checkpoint:
# pip install huggingface-hub
from huggingface_hub import snapshot_download
snapshot_download(repo_id="mistral-community/pixtral-12b-240910", local_dir="...")
βββββ
ββββββββββββββββββ
βββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββ
ββββββββββββββ ββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ βββ
ββββββββββββββββββββββββββββββββ ββββββββββββββββββ
ββββββββββββββββββββββββββββ
βββββββββββββββββ
βββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β PIXTRAL - 12B - v0.1 10/09/24 β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Β·Β· md5sum Β·Β· β β β β b8e9126ef0c15a1130c14b15e8432a67 consolidated.safetensors β β 68b39355a7b14a7d653292dab340a0be params.json β β 10229adc84036ff8fe44a2a8e2ad9ba9 tekken.json β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Β·Β· Released by the Mistral AI team Β·Β· β β β β - Use GELU for the vision adapter β β - Use 2D ROPE for the vision encoder β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Images
You can encode images as follows
from mistral_common.protocol.instruct.messages import (
UserMessage,
TextChunk,
ImageURLChunk,
ImageChunk,
)
from PIL import Image
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
tokenizer = MistralTokenizer.from_model("pixtral")
image = Image.new('RGB', (64, 64))
# tokenize images and text
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="Describe this image"),
ImageChunk(image=image),
]
)
],
model="pixtral",
)
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images
# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))
Image URLs
You can pass image url which will be automatically downloaded
url_dog = "https://picsum.photos/id/237/200/300"
url_mountain = "https://picsum.photos/seed/picsum/200/300"
# tokenize image urls and text
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="Can this animal"),
ImageURLChunk(image_url=url_dog),
TextChunk(text="live here?"),
ImageURLChunk(image_url=url_mountain),
]
)
],
model="pixtral",
)
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images
# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))
ImageData
You can also pass image encoded as base64
tokenized = tokenizer.encode_chat_completion(
ChatCompletionRequest(
messages=[
UserMessage(
content=[
TextChunk(text="What is this?"),
ImageURLChunk(image_url=""),
]
)
],
model="pixtral",
)
)
tokens, text, images = tokenized.tokens, tokenized.text, tokenized.images
# Count the number of tokens
print("# tokens", len(tokens))
print("# images", len(images))