Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Training

This is the 10k steps English supervised-fine-tuning (SFT) model of GPT-J using SODA dataset for Chai Competition.

Why OpenAssistant framework:

  • Easy to setup training with change config from dataset and model is all you need
  • Data processing available for almost popular conversation datasets: SODA, Vicuna, OpenAssistant, ...

Configuration:

You need to add this to default config file configs/config.yaml

data:
soda-only:
  datasets:
    - soda:
        fraction: 0.1
        input_max_length: 1024
gptj-chai:
  dtype: fp16
  log_dir: gptj-soda
  model_name: EleutherAI/gpt-j-6b
  output_dir: output/gptj-soda-chai
  max_length: 1024
  warmup_steps: 100
  gradient_checkpointing: true
  gradient_accumulation_steps: 1
  per_device_train_batch_size: 8
  per_device_eval_batch_size: 8
  eval_steps: 5000
  save_steps: 5000
  num_train_epochs: 1
  save_total_limit: 1
  use_flash_attention: false

Command to train:

deepspeed trainer_sft.py --local_rank=0 --configs defaults gptj-chai soda-only --cache_dir data_cache --deepspeed

Interactive Demo Code:

from transformers import AutoTokenizer, AutoModelForCausalLM



class ChatBot():
    def __init__(self, path="/mnt/hdd/duyphung/gptj-soda-chai/checkpoint-10000/"):
        self.tokenizer = AutoTokenizer.from_pretrained(path)
        self.model = AutoModelForCausalLM.from_pretrained(path).half().cuda().eval()
        self.model.pad_token_id = self.tokenizer.eos_token_id
        self.tokenizer.pad_token_id = self.tokenizer.eos_token_id

    def chat(self, message):
        enc_dict = self.tokenizer(
            message,
            return_tensors='pt'
        )
        for x in enc_dict:
            enc_dict[x] = enc_dict[x].cuda()
        chat_history_ids = self.model.generate(
            input_ids=enc_dict['input_ids'],
            attention_mask=enc_dict['attention_mask'],
            max_new_tokens=64,
            temperature=0.7,
            do_sample=True,
            top_k=0,
            top_p=0.95,
        )
        out = chat_history_ids[:, enc_dict['input_ids'].shape[-1]:][0]
        return self.tokenizer.decode(out, skip_special_tokens=True)


if __name__ == "__main__":
    bot_name = 'Bot:'
    prompt = "<|prompter|>"
    chat_history = []

    bot = ChatBot()
    while True:
        message = input("Me: ")
        chat_history.append(f'Me: {message}')
        prompt = prompt + message + "<|endoftext|><|assistant|>"
        response = bot.chat(prompt)
        print(f'{bot_name} {response}')
        prompt = prompt + response + "<|endoftext|><|prompter|>"
Downloads last month
8
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.