Jais-7b-chat (Its a double quantized version)

This model is the double quantized version of jais-13b-chat by core42. The aim is to run the model in GPU poor machines. For high quality tasks its better to use the 13b model not quantized one.

Model creator: Core42

Original model: jais-13b-chat

How To Run

Just run it as a text-generation pipeline task.

System Requirements:

It successfully has been tested on Google Colab Pro T4 instance.

How To Run:

  1. First install libs:
pip install -Uq huggingface_hub transformers bitsandbytes xformers accelerate
  1. Create the pipeline:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer, BitsAndBytesConfig

tokenizer = AutoTokenizer.from_pretrained("erfanvaredi/jais-7b-chat")
model = AutoModelForCausalLM.from_pretrained(
    "erfanvaredi/jais-7b-chat",
    trust_remote_code=True,
    device_map='auto',
)

# Create a pipeline
pipe = pipeline(model=model, tokenizer=tokenizer, task='text-generation')
  1. Create prompt:
chat = [
    {"role": "user", "content": 'Tell me a funny joke about Large Language Models.'},
]
prompt = pipe.tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
  1. Create streamer (Its optional. If u want to have generated texts as stream, do it else it does'nt matter):
streamer = TextStreamer(
    tokenizer,
    skip_prompt=True,
    stop_token=[tokenizer.eos_token]
)
  1. Ask the model:
pipe(
  prompt,
  streamer=streamer,
  max_new_tokens=256,
  temperature=0,
)

:)

Downloads last month
132
Safetensors
Model size
6.93B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.