turboderp's picture
Upload 17 files
dd0cc25 verified
|
raw
history blame
838 Bytes

Turbcat 8b

Release notes

This is a direct upgrade over cat 70B, with 2x the dataset size, added Chinese support.

Data Generation

The data generation process is largely the same. Except additional Chinese data were added. 20 postdocs participated in the annotation process and standard model training for embedding scoring.

Task coverage

In addition to standard assistant and roleplay data, the following tasks are targeted:

  • GRE
  • SAT
  • MCAT
  • Chinese Kaoyan

Thirdparty dataset

Thanks to the following people for their tremendous support for dataset generation:

  • steelskull for the medical COT dataset with gpt4o
  • Gryphe for the wonderful action packed dataset
  • Turbca for being turbca

Prompt format for 8b:

llama3

Prompt format for 72b:

chatml

Support

Please join https://discord.gg/DwGz54Mz for model support