kaitchup
/

OPT-1.3B-RLHF-DSChatLoRA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Model Card for Model ID

This a model is a chat model fine-tuned with RLHF using DeepSpeed Chat and LoRA. It is based on OPT1.3B.

Model Details

Model Description

Developed by: The Kaitchup
Model type: Causal
Language(s) (NLP): English
License: cc-by-nc-sa-4.0
Finetuned from model: facebook/opt-1.3b

Model Sources

The model has been trained with the procedure described in this article:

Train Instruct LLMs On Your GPU with DeepSpeed Chat — Step #3: Reinforcement Learning with Human Feedback

Downloads last month: 39

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Dataset used to train kaitchup/OPT-1.3B-RLHF-DSChatLoRA