QuantFactory
/

Starling-LM-7B-beta-GGUF

Text Generation

Model card Files Files and versions Community

Starling-LM-7B-beta-GGUF / README.md

munish0838's picture

Update README.md

52e5aaf verified 7 months ago

|

2.31 kB

	---
	base_model: Nexusflow/Starling-LM-7B-beta
	license: apache-2.0
	pipeline_tag: text-generation
	inference: false
	language:
	- en
	library_name: transformers
	tags:
	- conversational
	- reward model
	- RLHF
	- RLAIF
	---

	# Starling-LM-7B-beta-GGUF

	- Model creator: [Nexusflow](https://huggingface.co/Nexusflow)
	- Original model: [Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta)

	<!-- description start -->
	## Description

	This repo contains GGUF format model files for [Starling-LM-7B-beta](https://huggingface.co/Nexusflow/Starling-LM-7B-beta)

	Model Summary
	<!-- Provide a quick summary of what the model is/does. -->

	- Developed by: The Nexusflow Team ( Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao).
	- Model type: Language Model finetuned with RLHF / RLAIF
	- License: Apache-2.0 license under the condition that the model is not used to compete with OpenAI
	- Finetuned from model: [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) (based on [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1))


	We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with our new reward model [Nexusflow/Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B) and policy optimization method [Fine-Tuning Language Models from Human Preferences (PPO)](https://arxiv.org/abs/1909.08593).
	Harnessing the power of the ranking dataset, [berkeley-nest/Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar), the upgraded reward model, [Starling-RM-34B](https://huggingface.co/Nexusflow/Starling-RM-34B), and the new reward training and policy tuning pipeline, Starling-LM-7B-beta scores an improved 8.12 in MT Bench with GPT-4 as a judge.


	## Citation
	```
	@misc{starling2023,
	title = {Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF},
	url = {},
	author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Ganesan, Karthik and Chiang, Wei-Lin and Zhang, Jian and Jiao, Jiantao},
	month = {November},
	year = {2023}
	}
	```