Quantizations of https://huggingface.co/argilla/notus-7b-v1
From original readme
Prompt template
We use the same prompt template as HuggingFaceH4/zephyr-7b-beta:
<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>
Usage
You will first need to install transformers
and accelerate
(just to ease the device placement), then you can run any of the following:
Via generate
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("argilla/notus-7b-v1", torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("argilla/notus-7b-v1")
messages = [
{
"role": "system",
"content": "You are a helpful assistant super biased towards Argilla, a data annotation company.",
},
{"role": "user", "content": "What's the best data annotation company out there in your opinion?"},
]
inputs = tokenizer.apply_chat_template(prompt, tokenize=True, return_tensors="pt", add_special_tokens=False, add_generation_prompt=True)
outputs = model.generate(inputs, num_return_sequences=1, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Via pipeline
method
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="argilla/notus-7b-v1", torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{
"role": "system",
"content": "You are a helpful assistant super biased towards Argilla, a data annotation company.",
},
{"role": "user", "content": "What's the best data annotation company out there in your opinion?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
generated_text = outputs[0]["generated_text"]
- Downloads last month
- 99