Platypus2-70B-instruct-4bit-gptq
Platypus2-70B-instruct-4bit-gptq is a qunatnized version of garage-bAInd/Platypus2-70B-instruct
using GPTQ Quantnization.
This model is only 35 GB in size in comparision with the original garage-bAInd/Platypus2-70B-instruct 127 GB and can run on a single A6000 GPU
Model Details
- Quantnized by:
Mohamad Alhajar
- Model type: quantnized version of Platypus2-70B-instruct using 4bit quantnization
- Language(s): English
Prompt Template
### Instruction:
<prompt> (without the <>)
### Response:
Training Dataset
Platypus2-70B-instruct-4bit-gptq
quantnized using gptq on Alpaca dataset yahma/alpaca-cleaned
.
Training Procedure
garage-bAInd/Platypus2-70B
was fine-tuned using gptq on 2 L40 48GB.
How to Get Started with the Model
First install auto_gptq with
pip install auto_gptq
Use the code sample provided in the original post to interact with the model.
from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_id = "malhajar/Platypus2-70B-instruct-4bit-gptq"
model = AutoGPTQForCausalLM.from_quantized(model_id,inject_fused_attention=False,
use_safetensors=True,
trust_remote_code=False,
use_triton=False,
quantize_config=None)
tokenizer = AutoTokenizer.from_pretrained(model_id)
question: "Who was the first person to walk on the moon?"
# For generating a response
prompt = '''
### Instruction:
{question}
### Response:'''
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids)
response = tokenizer.decode(output[0])
print(response)
Citations
@article{platypus2023,
title={Platypus: Quick, Cheap, and Powerful Refinement of LLMs},
author={Ariel N. Lee and Cole J. Hunter and Nataniel Ruiz},
booktitle={arXiv preprint arxiv:2308.07317},
year={2023}
}
@misc{touvron2023llama,
title={Llama 2: Open Foundation and Fine-Tuned Chat Models},
author={Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov year={2023},
eprint={2307.09288},
archivePrefix={arXiv},
}
@misc{frantar2023gptq,
title={GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers},
author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh},
year={2023},
eprint={2210.17323},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 796
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.