Turbcat 8b

Release notes

This is a direct upgrade over cat 70B, with 2x the dataset size, added Chinese support.

Data Generation

The data generation process is largely the same. Except additional Chinese data were added. 20 postdocs participated in the annotation process and standard model training for embedding scoring.

Task coverage

In addition to standard assistant and roleplay data, the following tasks are targeted:

GRE
SAT
MCAT
Chinese Kaoyan

Thirdparty dataset

Thanks to the following people for their tremendous support for dataset generation:

steelskull for the medical COT dataset with gpt4o
Gryphe for the wonderful action packed dataset
Turbca for being turbca

Prompt format for 8b:

llama3

Prompt format for 72b:

chatml

Support

Please join https://discord.gg/DwGz54Mz for model support