lvwerra HF staff commited on
Commit
4d0f760
1 Parent(s): b033af5

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -7
app.py CHANGED
@@ -20,34 +20,36 @@ FIM_INDICATOR = "<FILL_HERE>"
20
 
21
  FORMATS = """## Model formats
22
 
 
 
23
  ### Prefixes
24
- Any combination of the three:
25
 
26
  ```
27
- <reponame>REPONAME<filename>FILENAME<gh_stars>STARS\nCode<eos>
28
  ```
29
- Stars be: 0, 1-10, 10-100, 100-1000, 1000+
30
 
31
  ### Commits
32
-
33
  ```
34
  <commit_before>code<commit_msg>text<commit_after>code<|endoftext|>
35
  ```
36
 
37
  ### Jupyter structure
38
-
39
  ```
40
  <start_jupyter><jupyter_text>text<jupyter_code>code<jupyter_output>output<jupyter_text>
41
  ```
42
 
43
  ### Issues
44
-
45
  ```
46
  <issue_start><issue_comment>text<issue_comment>...<issue_closed>
47
  ```
48
 
49
  ### Fill-in-the-middle
50
-
51
  ```
52
  code before<FILL_HERE>code after
53
  ```
 
20
 
21
  FORMATS = """## Model formats
22
 
23
+ The model is pretrained on code and in addition to the pure code data it is formatted with special tokens. E.g. prefixes specifying the source of the file or special tokens separating code from a commit message. See below:
24
+
25
  ### Prefixes
26
+ Any combination of the three following prefixes can be found in pure code files:
27
 
28
  ```
29
+ <reponame>REPONAME<filename>FILENAME<gh_stars>STARS\ncode<|endoftext|>
30
  ```
31
+ STARS can be one of: 0, 1-10, 10-100, 100-1000, 1000+
32
 
33
  ### Commits
34
+ The commits data is formatted as follows:
35
  ```
36
  <commit_before>code<commit_msg>text<commit_after>code<|endoftext|>
37
  ```
38
 
39
  ### Jupyter structure
40
+ Jupyter notebooks were both trained in form of Python scripts as well as the following structured format:
41
  ```
42
  <start_jupyter><jupyter_text>text<jupyter_code>code<jupyter_output>output<jupyter_text>
43
  ```
44
 
45
  ### Issues
46
+ We also trained on GitHub issues using the following formatting:
47
  ```
48
  <issue_start><issue_comment>text<issue_comment>...<issue_closed>
49
  ```
50
 
51
  ### Fill-in-the-middle
52
+ Fill in the middle requires rearranging the model inputs. The playground does this for you - all you need is to specify where to fill:
53
  ```
54
  code before<FILL_HERE>code after
55
  ```