LLM Generation models trained by Jina AI, Finetuner team.
This repo contains the lora weights (8bit) for Falcon-40b fit on the Code Alpaca dataset.
Reproduction
This version of the weights was trained with the following hyperparameters:
- Epochs: 2
- Batch size: 128
- Micro batch size: 4
- Learning rate: 3e-4
- Lora r: 8
- Lora target modules: query_key_value
You can reproduce using this repository:
https://github.com/jina-ai/jerboa
Make sure you install requirements and finetune using this command using the following command:
python finetune.py \
--base-model tiiuae/falcon-40b --lora-target-modules query_key_value \
--data-path sahil2801/CodeAlpaca-20k --output-dir ./lora-alpaca-code \
--batch-size 128 --micro-batch-size 4 --eval-limit 45 \
--eval-file code_eval.jsonl --wandb-project jerboa --wandb-log-model \
--wandb-watch gradients --num-epochs 2
Inference
import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
TOKENIZER_SOURCE = 'tiiuae/falcon-40b'
BASE_MODEL = 'tiiuae/falcon-40b'
LORA_REPO = 'jinaai/falcon-40b-code-alpaca-lora'
DEVICE = "cuda"
PROMPT = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Write a for loop in python
### Input:
### Response:
"""
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=BASE_MODEL,
torch_dtype=torch.float16,
trust_remote_code=True,
device_map='auto',
)
model = PeftModel.from_pretrained(
model=model,
model_id=LORA_REPO,
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(
TOKENIZER_SOURCE,
trust_remote_code=True,
padding_side='left',
)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(PROMPT, return_tensors="pt")
input_ids = inputs["input_ids"].to(DEVICE)
input_attention_mask = inputs["attention_mask"].to(DEVICE)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
attention_mask=input_attention_mask,
return_dict_in_generate=True,
max_new_tokens=32,
eos_token_id=tokenizer.eos_token_id,
)
generation_output = generation_output.sequences[0]
output = tokenizer.decode(generation_output, skip_special_tokens=True)
print(output)
Contact
Join our Discord community and chat with other community members about ideas.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.