Turbcat 8b
Release notes
This is a direct upgrade over cat 70B, with 2x the dataset size, added Chinese support.
Data Generation
The data generation process is largely the same. Except additional Chinese data were added. 20 postdocs participated in the annotation process and standard model training for embedding scoring.
Task coverage
In addition to standard assistant and roleplay data, the following tasks are targeted:
- GRE
- SAT
- MCAT
- Chinese Kaoyan
Thirdparty dataset
Thanks to the following people for their tremendous support for dataset generation:
- steelskull for the medical COT dataset with gpt4o
- Gryphe for the wonderful action packed dataset
- Turbca for being turbca
Prompt format for 8b:
llama3
Prompt format for 72b:
chatml
Support
Please join https://discord.gg/DwGz54Mz for model support