pvduy
/

gptj-soda-chai

Text Generation

Inference Endpoints

Model card Files Files and versions Community

gptj-soda-chai / README.md

pvduy's picture

Update README.md

95596fa over 1 year ago

|

2.93 kB

	# Training

	This is the 10k steps English supervised-fine-tuning (SFT) model of GPT-J using SODA dataset for Chai Competition.

	- Language: English
	- Finetuned from: [EleutherAI / GPT-J](https://huggingface.co/EleutherAI/gpt-j-6b)
	- Code: [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
	- Dataset: 10 percent from [SODA dataset](https://huggingface.co/datasets/allenai/soda)

	# Why OpenAssistant framework:
	- Easy to setup training with change config from dataset and model is all you need
	- Data processing available for almost popular conversation datasets: SODA, Vicuna, OpenAssistant, ...

	# Configuration:

	You need to add this to default config file `configs/config.yaml`

	data:
	```
	soda-only:
	datasets:
	- soda:
	fraction: 0.1
	input_max_length: 1024
	```

	gptj-chai:
	```
	dtype: fp16
	log_dir: gptj-soda
	model_name: EleutherAI/gpt-j-6b
	output_dir: output/gptj-soda-chai
	max_length: 1024
	warmup_steps: 100
	gradient_checkpointing: true
	gradient_accumulation_steps: 1
	per_device_train_batch_size: 8
	per_device_eval_batch_size: 8
	eval_steps: 5000
	save_steps: 5000
	num_train_epochs: 1
	save_total_limit: 1
	use_flash_attention: false
	```

	# Command to train:

	```bash
	deepspeed trainer_sft.py --local_rank=0 --configs defaults gptj-chai soda-only --cache_dir data_cache --deepspeed
	```

	# Demo code:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM



	class ChatBot():
	def __init__(self, path="/mnt/hdd/duyphung/gptj-soda-chai/checkpoint-10000/"):
	self.tokenizer = AutoTokenizer.from_pretrained(path)
	self.model = AutoModelForCausalLM.from_pretrained(path).half().cuda().eval()
	self.model.pad_token_id = self.tokenizer.eos_token_id
	self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

	def chat(self, message):
	enc_dict = self.tokenizer(
	message,
	return_tensors='pt'
	)
	for x in enc_dict:
	enc_dict[x] = enc_dict[x].cuda()
	chat_history_ids = self.model.generate(
	input_ids=enc_dict['input_ids'],
	attention_mask=enc_dict['attention_mask'],
	max_new_tokens=64,
	temperature=0.7,
	do_sample=True,
	top_k=0,
	top_p=0.95,
	)
	out = chat_history_ids[:, enc_dict['input_ids'].shape[-1]:][0]
	return self.tokenizer.decode(out, skip_special_tokens=True)


	if __name__ == "__main__":
	bot_name = 'Bot:'
	prompt = "<\|prompter\|>"
	chat_history = []

	bot = ChatBot()
	while True:
	message = input("Me: ")
	chat_history.append(f'Me: {message}')
	prompt = prompt + message + "<\|endoftext\|><\|assistant\|>"
	response = bot.chat(prompt)
	print(f'{bot_name} {response}')
	prompt = prompt + response + "<\|endoftext\|><\|prompter\|>"
	```