README.md · openchat/openchat-3.5-0106-gemma at 8e9b04f3092b43b7e8aed218c33bab2ce1e16e6c

metadata

license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms

The world's first Gemma fine-tune based on openchat-3.5-0106 data and method (C-RLFT). Almost the same performance as Mistral-based openchat, and much better than Gemma-7b and Gemma-7b-it.

Please refer to openchat-3.5-0106 for details.

P.S.: 6T pre-training tokens + 0.003 init std dev + C-RLFT is the secret sauce?

P.P.S.: @Google team, we know your model is great, but please use an OSI-approved license like Mistral (or even Phi and Orca).

Benchmarks

Model	# Params	Average	MT-Bench	HumanEval	BBH MC	AGIEval	TruthfulQA	MMLU	GSM8K	BBH CoT
OpenChat-3.5-0106 Gemma	7B	64.4	7.83	67.7	52.7	50.2	55.4	65.7	81.5	63.7
OpenChat-3.5-0106 Mistral	7B	64.5	7.8	71.3	51.5	49.1	61.0	65.8	77.4	62.2
ChatGPT (March)	???B	61.5	7.94	48.1	47.6	47.1	57.7	67.3	74.9	70.1

Gemma-7B	7B	-	-	32.3	-	41.7	-	64.3	46.4	-
Gemma-7B-it *	7B	25.4	-	28.0	38.4	32.5	34.1	26.5	10.8	7.6
OpenHermes 2.5	7B	59.3	7.54	48.2	49.4	46.5	57.5	63.8	73.5	59.9

*: Gemma-7b-it failed to understand and follow most few-shot templates.

Usage

To use this model, we highly recommend installing the OpenChat package by following the installation guide in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append --tensor-parallel-size N to the serving command.

Once started, the server listens at localhost:18888 for requests and is compatible with the OpenAI ChatCompletion API specifications. Please refer to the example request below for reference. Additionally, you can use the OpenChat Web UI for a user-friendly experience.

If you want to deploy the server as an online service, you can use --api-keys sk-KEY1 sk-KEY2 ... to specify allowed API keys and --disable-log-requests --disable-log-stats --log-file openchat.log for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server.

Model	Size	Context	Weights	Serving
OpenChat-3.5-0106-Gemma	7B	8192	Huggingface	`python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106-gemma --engine-use-ray --worker-use-ray`

Example request (click to expand)

curl http://localhost:18888/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openchat_3.5_gemma_new",
    "messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
  }'

Conversation template

⚠️ Notice: This is different from the Mistral version. End-of-turn token is <end_of_turn> now (Mistral version is <|end_of_turn|>). Remember to set <end_of_turn> as end of generation token.

GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant:

With system message (NOT recommended, may degrade performance)

You are a helpful assistant.<end_of_turn>GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant: