Quantization made by Richard Erkhov.

llama-2-70b-fb16-guanaco-1k - GGUF

Model creator: https://huggingface.co/quantumaikr/
Original model: https://huggingface.co/quantumaikr/llama-2-70b-fb16-guanaco-1k/

Name	Quant method	Size
llama-2-70b-fb16-guanaco-1k.Q2_K.gguf	Q2_K	23.71GB
llama-2-70b-fb16-guanaco-1k.IQ3_XS.gguf	IQ3_XS	26.37GB
llama-2-70b-fb16-guanaco-1k.IQ3_S.gguf	IQ3_S	27.86GB
llama-2-70b-fb16-guanaco-1k.Q3_K_S.gguf	Q3_K_S	27.86GB
llama-2-70b-fb16-guanaco-1k.IQ3_M.gguf	IQ3_M	28.82GB
llama-2-70b-fb16-guanaco-1k.Q3_K.gguf	Q3_K	30.99GB
llama-2-70b-fb16-guanaco-1k.Q3_K_M.gguf	Q3_K_M	30.99GB
llama-2-70b-fb16-guanaco-1k.Q3_K_L.gguf	Q3_K_L	33.67GB
llama-2-70b-fb16-guanaco-1k.IQ4_XS.gguf	IQ4_XS	34.64GB
llama-2-70b-fb16-guanaco-1k.Q4_0.gguf	Q4_0	36.2GB
llama-2-70b-fb16-guanaco-1k.IQ4_NL.gguf	IQ4_NL	36.55GB
llama-2-70b-fb16-guanaco-1k.Q4_K_S.gguf	Q4_K_S	36.55GB
llama-2-70b-fb16-guanaco-1k.Q4_K.gguf	Q4_K	38.58GB
llama-2-70b-fb16-guanaco-1k.Q4_K_M.gguf	Q4_K_M	38.58GB
llama-2-70b-fb16-guanaco-1k.Q4_1.gguf	Q4_1	40.2GB
llama-2-70b-fb16-guanaco-1k.Q5_0.gguf	Q5_0	44.2GB
llama-2-70b-fb16-guanaco-1k.Q5_K_S.gguf	Q5_K_S	44.2GB
llama-2-70b-fb16-guanaco-1k.Q5_K.gguf	Q5_K	45.41GB
llama-2-70b-fb16-guanaco-1k.Q5_K_M.gguf	Q5_K_M	45.41GB
llama-2-70b-fb16-guanaco-1k.Q5_1.gguf	Q5_1	48.2GB
llama-2-70b-fb16-guanaco-1k.Q6_K.gguf	Q6_K	52.7GB
llama-2-70b-fb16-guanaco-1k.Q8_0.gguf	Q8_0	68.26GB

Original model description:

license: cc-by-nc-4.0 language: - en pipeline_tag: text-generation

quantumaikr/llama-2-70b-fb16-guanaco-1k

Model Description

quantumaikr/llama-2-70b-fb16-guanaco-1k is a Llama2 70B model finetuned on an guanaco, mlabonne/guanaco-llama2-1k Dataset

Usage

Start chatting with quantumaikr/llama-2-70b-fb16-guanaco-1k using the following code snippet:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("quantumaikr/llama-2-70b-fb16-guanaco-1k")
model = AutoModelForCausalLM.from_pretrained("quantumaikr/llama-2-70b-fb16-guanaco-1k", torch_dtype=torch.float16, device_map="auto")

system_prompt = "### System:\nYou are QuantumLM, an AI that follows instructions extremely well. Help as much as you can. Remember, be safe, and don't do anything illegal.\n\n"

message = "Write me a poem please"
prompt = f"{system_prompt}### User: {message}\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, do_sample=True, top_p=0.95, top_k=0, max_new_tokens=256)

print(tokenizer.decode(output[0], skip_special_tokens=True))

QuantumLM should be used with this prompt format:

### System:
This is a system prompt, please behave and help the user.

### User:
Your prompt here

### Assistant
The output of QuantumLM

Use and Limitations

Intended Use

These models are intended for research only, in adherence with the CC BY-NC-4.0 license.

Limitations and bias

Although the aforementioned dataset helps to steer the base language models into "safer" distributions of text, not all biases and toxicity can be mitigated through fine-tuning. We ask that users be mindful of such potential issues that can arise in generated responses. Do not treat model outputs as substitutes for human judgment or as sources of truth. Please use it responsibly.