license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
The world's first Gemma fine-tune based on openchat-3.5-0106 data and method (C-RLFT). Almost the same performance as Mistral-based openchat, and much better than Gemma-7b and Gemma-7b-it.
Please refer to openchat-3.5-0106 for details.
P.S.: 6T pre-training tokens + 0.003 init std dev + C-RLFT is the secret sauce?
P.P.S.: @Google team, we know your model is great, but please use an OSI-approved license like Mistral (or even Phi and Orca).
Benchmarks
Model | # Params | Average | MT-Bench | HumanEval | BBH MC | AGIEval | TruthfulQA | MMLU | GSM8K | BBH CoT |
---|---|---|---|---|---|---|---|---|---|---|
OpenChat-3.5-0106 Gemma | 7B | 64.4 | 7.83 | 67.7 | 52.7 | 50.2 | 55.4 | 65.7 | 81.5 | 63.7 |
OpenChat-3.5-0106 Mistral | 7B | 64.5 | 7.8 | 71.3 | 51.5 | 49.1 | 61.0 | 65.8 | 77.4 | 62.2 |
ChatGPT (March) | ???B | 61.5 | 7.94 | 48.1 | 47.6 | 47.1 | 57.7 | 67.3 | 74.9 | 70.1 |
Gemma-7B | 7B | - | - | 32.3 | - | 41.7 | - | 64.3 | 46.4 | - |
Gemma-7B-it * | 7B | 25.4 | - | 28.0 | 38.4 | 32.5 | 34.1 | 26.5 | 10.8 | 7.6 |
OpenHermes 2.5 | 7B | 59.3 | 7.54 | 48.2 | 49.4 | 46.5 | 57.5 | 63.8 | 73.5 | 59.9 |
*: Gemma-7b-it
failed to understand and follow most few-shot templates.
Usage
To use this model, we highly recommend installing the OpenChat package by following the installation guide in our repository and using the OpenChat OpenAI-compatible API server by running the serving command from the table below. The server is optimized for high-throughput deployment using vLLM and can run on a consumer GPU with 24GB RAM. To enable tensor parallelism, append --tensor-parallel-size N
to the serving command.
Once started, the server listens at localhost:18888
for requests and is compatible with the OpenAI ChatCompletion API specifications. Please refer to the example request below for reference. Additionally, you can use the OpenChat Web UI for a user-friendly experience.
If you want to deploy the server as an online service, you can use --api-keys sk-KEY1 sk-KEY2 ...
to specify allowed API keys and --disable-log-requests --disable-log-stats --log-file openchat.log
for logging only to a file. For security purposes, we recommend using an HTTPS gateway in front of the server.
Model | Size | Context | Weights | Serving |
---|---|---|---|---|
OpenChat-3.5-0106-Gemma | 7B | 8192 | Huggingface | python -m ochat.serving.openai_api_server --model openchat/openchat-3.5-0106-gemma --engine-use-ray --worker-use-ray |
Example request (click to expand)
curl http://localhost:18888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openchat_3.5_gemma_new",
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
}'
Conversation template
⚠️ Notice: This is different from the Mistral version. End-of-turn token is <end_of_turn>
now (Mistral version is <|end_of_turn|>
). Remember to set <end_of_turn>
as end of generation token.
GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant:
With system message (NOT recommended, may degrade performance)
You are a helpful assistant.<end_of_turn>GPT4 Correct User: Hello<end_of_turn>GPT4 Correct Assistant: Hi<end_of_turn>GPT4 Correct User: How are you today?<end_of_turn>GPT4 Correct Assistant: