starcoder-playground

Runtime error

Hector Salvador [Fisharp]

Relocation of static contents to their own files

32f7b3e over 1 year ago

1.24 kB

	## Model Formats

	The model is pretrained on code and is formatted with special tokens in addition to the pure code data,\
	such as prefixes specifying the source of the file or tokens separating code from a commit message.\
	Use these templates to explore the model's capacities:

	### 1. Prefixes 🏷️

	For pure code files, use any combination of the following prefixes:

	```
	<reponame>REPONAME<filename>FILENAME<gh_stars>STARS\ncode<\|endoftext\|>
	```

	STARS can be one of: 0, 1-10, 10-100, 100-1000, 1000+

	### 2. Commits 💾

	The commits data is formatted as follows:

	```
	<commit_before>code<commit_msg>text<commit_after>code<\|endoftext\|>
	```

	### 3. Jupyter Notebooks 📓

	The model is trained on Jupyter notebooks as Python scripts and structured formats like:

	```
	<start_jupyter><jupyter_text>text<jupyter_code>code<jupyter_output>output<jupyter_text>
	```

	### 4. Issues 🐛

	We also trained on GitHub issues using the following formatting:

	```
	<issue_start><issue_comment>text<issue_comment>...<issue_closed>
	```

	### 5. Fill-in-the-middle 🧩

	Fill in the middle requires rearranging the model inputs. The playground handles this for you - all you need is to specify where to fill:

	```
	code before<FILL_HERE>code after
	```