Model Card for Carpincho-30b
This is Carpincho-30B qlora 4-bit checkpoint, an Instruction-tuned LLM based on LLama-30B. It is trained to answer in colloquial spanish Argentine language.
It was trained on 2x3090 (48G) for 120 hs using huggingface QLoRA code (4-bit quantization)
Model Details
The model is provided in LoRA format.
Usage
Here is example inference code, you will need to install the following requirements:
bitsandbytes==0.39.0
transformers @ git+https://github.com/huggingface/transformers.git
peft @ git+https://github.com/huggingface/peft.git
accelerate @ git+https://github.com/huggingface/accelerate.git
einops==0.6.1
evaluate==0.4.0
scikit-learn==1.2.2
sentencepiece==0.1.99
wandb==0.15.3
import time
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer
model_name = "models/huggyllama_llama-30b/"
adapters_name = 'carpincho-30b-qlora'
print(f"Starting to load the model {model_name} into memory")
model = AutoModelForCausalLM.from_pretrained(
model_name,
load_in_4bit=True,
torch_dtype=torch.bfloat16,
device_map="sequential"
)
print(f"Loading {adapters_name} into memory")
model = PeftModel.from_pretrained(model, adapters_name)
tokenizer = LlamaTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1
stop_token_ids = [0]
print(f"Successfully loaded the model {model_name} into memory")
def main(tokenizer):
prompt = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
%s
### Response:
''' % "Hola, como estas?"
batch = tokenizer(prompt, return_tensors="pt")
batch = {k: v.cuda() for k, v in batch.items()}
with torch.no_grad():
generated = model.generate(inputs=batch["input_ids"],
do_sample=True, use_cache=True,
repetition_penalty=1.1,
max_new_tokens=100,
temperature=0.9,
top_p=0.95,
top_k=40,
return_dict_in_generate=True,
output_attentions=False,
output_hidden_states=False,
output_scores=False)
result_text = tokenizer.decode(generated['sequences'].cpu().tolist()[0])
print(result_text)
main(tokenizer)
Model Description
- Developed by: Alfredo Ortega (@ortegaalfredo)
- Model type: 30B LLM QLoRA
- Language(s): (NLP): English and colloquial Argentine Spanish
- License: Free for non-commercial use, but I'm not the police.
- Finetuned from model: https://huggingface.co/huggyllama/llama-30b
Model Sources [optional]
- Repository: https://huggingface.co/huggyllama/llama-30b
- Paper [optional]: https://arxiv.org/abs/2302.13971
Uses
This is a generic LLM chatbot that can be used to interact directly with humans.
Bias, Risks, and Limitations
This bot is uncensored and may provide shocking answers. Also it contains bias present in the training material.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
Model Card Contact
Contact the creator at @ortegaalfredo on twitter/github