snowfly
/

llama2-7b-QLoRA-dolly

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

llama2-7b-QLoRA-dolly / README.md

snowfly's picture

update

313a438 verified 5 months ago

|

history blame contribute delete

1.76 kB

	---
	license: apache-2.0
	datasets:
	- databricks/databricks-dolly-15k
	language:
	- en
	---
	## 模型介绍
	- 使用模型：LLaMA2-7B
	- 微调方法：QLoRA
	- 数据集：databricks/databricks-dolly-15k
	- 显卡：一张RTX4090
	- 目标：对模型进行指令微调
	## 使用方法
	- 加载数据
	```
	from datasets import load_dataset
	from random import randrange


	# 从hub加载数据集并得到一个样本
	dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
	sample = dataset[randrange(len(dataset))]
	```
	- 模型使用
	```
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_name_or_path = "snowfly/llama2-7b-QLoRA-dolly"
	tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name_or_path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_name_or_path,
	trust_remote_code=True,
	low_cpu_mem_usage=True,
	torch_dtype=torch.float16,
	load_in_4bit=True)
	model = model.eval()


	prompt = f"""### Instruction:
	Use the Input below to create an instruction, which could have been used to generate the input using an LLM.

	### Input:
	{sample['response']}

	### Response:
	"""

	input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()

	outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.9)

	print(f"Prompt:\n{sample['response']}\n")
	print(f"Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}")
	print(f"Ground truth:\n{sample['instruction']}")
	```