Edit model card

Model Card for alokabhishek/Mistral-7B-Instruct-v0.2-bnb-8bit

This repo contains 8-bit quantized (using bitsandbytes) model Mistral AI_'s Mistral-7B-Instruct-v0.2

Model Details

About 8 bit quantization using bitsandbytes

How to Get Started with the Model

Use the code below to get started with the model.

How to run from Python code

First install the package

!pip install --quiet bitsandbytes
!pip install --quiet --upgrade transformers # Install latest version of transformers
!pip install --quiet --upgrade accelerate
!pip install --quiet sentencepiece
pip install flash-attn --no-build-isolation

Import

import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM

Use a pipeline as a high-level helper

model_id_mistral = "alokabhishek/Mistral-7B-Instruct-v0.2-bnb-8bit"

tokenizer_mistral = AutoTokenizer.from_pretrained(model_id_mistral, use_fast=True)

model_mistral = AutoModelForCausalLM.from_pretrained(
    model_id_mistral,
    device_map="auto"
)


pipe_mistral = pipeline(model=model_mistral, tokenizer=tokenizer_mistral, task='text-generation')

prompt_mistral = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

output_mistral = pipe_llama(prompt_mistral, max_new_tokens=512)

print(output_mistral[0]["generated_text"])

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Evaluation

Metrics

[More Information Needed]

Results

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
13
Safetensors
Model size
7.24B params
Tensor type
F32
FP16
I8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.