Hector Salvador [Fisharp]
Relocation of static contents to their own files
32f7b3e
|
raw
history blame
1.24 kB

Model Formats

The model is pretrained on code and is formatted with special tokens in addition to the pure code data,
such as prefixes specifying the source of the file or tokens separating code from a commit message.
Use these templates to explore the model's capacities:

1. Prefixes 🏷️

For pure code files, use any combination of the following prefixes:

<reponame>REPONAME<filename>FILENAME<gh_stars>STARS\ncode<|endoftext|>

STARS can be one of: 0, 1-10, 10-100, 100-1000, 1000+

2. Commits πŸ’Ύ

The commits data is formatted as follows:

<commit_before>code<commit_msg>text<commit_after>code<|endoftext|>

3. Jupyter Notebooks πŸ““

The model is trained on Jupyter notebooks as Python scripts and structured formats like:

<start_jupyter><jupyter_text>text<jupyter_code>code<jupyter_output>output<jupyter_text>

4. Issues πŸ›

We also trained on GitHub issues using the following formatting:

<issue_start><issue_comment>text<issue_comment>...<issue_closed>

5. Fill-in-the-middle 🧩

Fill in the middle requires rearranging the model inputs. The playground handles this for you - all you need is to specify where to fill:

code before<FILL_HERE>code after