|
--- |
|
base_model: Nexusflow/Starling-LM-7B-beta |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
inference: false |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- conversational |
|
- reward model |
|
- RLHF |
|
- RLAIF |
|
--- |
|
|
|
# Starling-LM-7B-beta-GGUF |
|
|
|
- Model creator: [Nexusflow](https://huggingface.co/Nexusflow) |
|
- Original model: [Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta) |
|
|
|
<!-- description start --> |
|
## Description |
|
|
|
This repo contains GGUF format model files for [Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta) |
|
|
|
**Model Summary** |
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
- **Developed by: The Nexusflow Team (** Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao). |
|
- **Model type:** Language Model finetuned with RLHF / RLAIF |
|
- **License:** Apache-2.0 license under the condition that the model is not used to compete with OpenAI |
|
- **Finetuned from model:** [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)) |
|
|
|
|
|
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593). |
|
Harnessing the power of the ranking dataset, [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar), the upgraded reward model, [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B), and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge. |
|
|
|
|
|
## Citation |
|
``` |
|
@misc{starling2023, |
|
title = {Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF}, |
|
url = {}, |
|
author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Ganesan, Karthik and Chiang, Wei-Lin and Zhang, Jian and Jiao, Jiantao}, |
|
month = {November}, |
|
year = {2023} |
|
} |
|
``` |