Model that is fine-tuned in 4-bit precision using QLoRA on timdettmers/openassistant-guanaco and sharded to be used on a free Google Colab instance.
It can be easily imported using the AutoModelForCausalLM
class from transformers
:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"guardrail/llama-2-7b-guanaco-8bit-sharded",
load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
- Downloads last month
- 10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.