File size: 963 Bytes
4b911d1
 
d8f8550
4b911d1
 
 
 
 
 
d8f8550
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
title: README
emoji: 🐱
colorFrom: green
colorTo: red
sdk: static
pinned: false
---

# 🐱 KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding

KodCode is the largest fully-synthetic open-source dataset providing verifiable solutions and tests for coding tasks. It contains 12 distinct subsets spanning various domains (from algorithmic to package-specific knowledge) and difficulty levels (from basic coding exercises to interview and competitive programming challenges). KodCode is designed for both supervised fine-tuning (SFT) and RL tuning.


<div align="center">

πŸ•ΈοΈ [Project Website](https://kodcode-ai.github.io/) | πŸ“„ [Technical Report](https://arxiv.org/abs/2503.02951) | πŸ’Ύ [Github Repo](https://github.com/KodCode-AI/kodcode) | πŸ€— [KodCode-V1 (For RL)](https://huggingface.co/datasets/KodCode/KodCode-V1) | πŸ€— [KodCode-V1-SFT-R1 (for SFT)](https://huggingface.co/datasets/KodCode/KodCode-V1-SFT-R1)

</div>