Text Generation
GGUF
code
Inference Endpoints
conversational
CISCai commited on
Commit
a5f937a
1 Parent(s): d1f6790

Requantized with fixed imatrix

Browse files
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
  license: apache-2.0
 
3
  tags:
4
  - code
5
  language:
@@ -24,12 +25,12 @@ This repo contains State Of The Art quantized GGUF format model files for [Yi-Co
24
 
25
  Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
26
 
 
 
27
  **Update September 5th**: Marked <|im_start|> as a special token, fixing tokenization.
28
 
29
  Corrected EOS (<|im_end|>) and added EOT (<|endoftext|>) token to prevent infinite responses (am I the only one actually dog-fooding my own quants?).
30
 
31
- Fill-in-Middle token metadata has been added, see [example](#simple-llama-cpp-python-example-fill-in-middle-code). NOTE: Yi's FIM requires support for [SPM infill mode](https://github.com/abetlen/llama-cpp-python/pull/1492)! However it seems it has not been extensively trained for this (perhaps not at all), so don't expect particularly great results...
32
-
33
  <!-- description end -->
34
 
35
 
@@ -177,38 +178,6 @@ print(llm.create_chat_completion(
177
  ))
178
  ```
179
 
180
- #### Simple llama-cpp-python example fill-in-middle code
181
-
182
- ```python
183
- from llama_cpp import Llama
184
-
185
- # Completion API
186
-
187
- prompt = "def add("
188
- suffix = "\n return sum\n\n"
189
-
190
- llm = Llama(model_path="./Yi-Coder-9B-Chat.IQ4_XS.gguf", n_gpu_layers=49, n_ctx=131072, spm_infill=True)
191
- output = llm.create_completion(
192
- temperature = 0.0,
193
- repeat_penalty = 1.0,
194
- prompt = prompt,
195
- suffix = suffix
196
- )
197
-
198
- # Models sometimes repeat suffix in response, attempt to filter that
199
- response = output["choices"][0]["text"]
200
- response_stripped = response.rstrip()
201
- unwanted_response_suffix = suffix.rstrip()
202
- unwanted_response_length = len(unwanted_response_suffix)
203
-
204
- filtered = False
205
- if unwanted_response_suffix and response_stripped[-unwanted_response_length:] == unwanted_response_suffix:
206
- response = response_stripped[:-unwanted_response_length]
207
- filtered = True
208
-
209
- print(f"Fill-in-Middle completion{' (filtered)' if filtered else ''}:\n\n{prompt}\033[32m{response}\033[{'33' if filtered else '0'}m{suffix}\033[0m")
210
- ```
211
-
212
  <!-- README_GGUF.md-how-to-run end -->
213
 
214
  <!-- original-model-card start -->
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: text-generation
4
  tags:
5
  - code
6
  language:
 
25
 
26
  Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
27
 
28
+ **Update September 19th**: Requantized with new imatrix after finding a [bug](https://github.com/ggerganov/llama.cpp/pull/9543) in `llama-imatrix` that degraded the data set. Also removed the Fill-in-Middle tokens as they are [not properly supported](https://huggingface.co/01-ai/Yi-Coder-9B-Chat/discussions/5).
29
+
30
  **Update September 5th**: Marked <|im_start|> as a special token, fixing tokenization.
31
 
32
  Corrected EOS (<|im_end|>) and added EOT (<|endoftext|>) token to prevent infinite responses (am I the only one actually dog-fooding my own quants?).
33
 
 
 
34
  <!-- description end -->
35
 
36
 
 
178
  ))
179
  ```
180
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  <!-- README_GGUF.md-how-to-run end -->
182
 
183
  <!-- original-model-card start -->
Yi-Coder-9B-Chat-bf16-00001-of-00002.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c1f9006b2b3fabab9f796b643fc488ee6c4ead7522768c76e69d2761ff2ee3c3
3
- size 1478133
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2cb211e2ee5e0276772bb952baa723c624578b8b16eb1531243ce5cae4713976
3
+ size 1477995
Yi-Coder-9B-Chat.IQ1_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6ac4ada21f1e92506dfa0c63d4b77d488ebed479c6d8e21ca1ed767101cb37ee
3
- size 2181641152
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1a7af2e7b3fc3e85e811a09a1701ea1f6cc9f025af04d7629df97c24a70bf16
3
+ size 2181641024
Yi-Coder-9B-Chat.IQ1_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:36b3f5ce2f04d458a72c98819569873ac08ad28017034868050b368070207e80
3
- size 2014573504
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5180f9e3d872d3db1b7caff4f35e6d7973dc5dd8596ad84522e9056df5927a37
3
+ size 2014573376
Yi-Coder-9B-Chat.IQ2_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:12203a97c90c2464a78fe97aca2d8f9b371ed08d374da72456f33de0d3635dda
3
- size 3098112960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5515525a85fe96ffacb3e9133a72375156c0be063801e2c1db3c001806e681e
3
+ size 3098112832
Yi-Coder-9B-Chat.IQ2_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1e8d3345865f550163495d1c1296d0f984bcafab82042f4bb0b4cf5c4df908dc
3
- size 2875356096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82f1a4a9d69180978e90de03a0c7b2682cb026383d723a8b043924d7c002171e
3
+ size 2875355968
Yi-Coder-9B-Chat.IQ2_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:14f905b46f88737326eeecb13441f471e01d58b972eab1ece759186017339c52
3
- size 2708009920
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e6124847223b973dbb776b467e35ce7121f532b2301327abc66a55b1ab768452
3
+ size 2708009792
Yi-Coder-9B-Chat.IQ2_XXS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f8b1337522073c78a4b90692912b61930d2c5f0bcc6670f5d95c2a209ce79c2e
3
- size 2460087232
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5508f3e75294d78cd6b7cf7cca5fbb2a4a86ec264f7a20cffb6a91cebc23b1ad
3
+ size 2460087104
Yi-Coder-9B-Chat.IQ3_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:acf6c496ce02c15ed057f7f3309d6c6ced33268c89dc6b9affbe961f80ec8a4b
3
- size 4055462848
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7df755d7533e67a5abc51eb666bbf8262317eebcd21b31ef772576d859260332
3
+ size 4055462720
Yi-Coder-9B-Chat.IQ3_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9e7b5d91cbc6672ef853eac657bd0b5878c935bdf4a9b4d0ff1fe9ed515a283e
3
- size 3912577984
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5435e44b532b27c33567f07187f516f788fee8dc90439ea56c7599547cc25aab
3
+ size 3912577856
Yi-Coder-9B-Chat.IQ3_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7a3d2ae4e215f3f316c5fc9b93bc795657d87df542dddbd98299a1fe2f6b2e6c
3
- size 3717936064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6e4fbe44ee79d4be26a70169578260d223a145e2a152edcc653d9af2a2ca34d
3
+ size 3717935936
Yi-Coder-9B-Chat.IQ3_XXS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f4017cf9b95ec59351acd8ba840af6dc30dfff43d6212b956916f94f8b8dc271
3
- size 3474322368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c25f9a7bec86e399e01d7af69509d63c55e2bb4919b9991258a95b2aa1587753
3
+ size 3474322240
Yi-Coder-9B-Chat.IQ4_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:57a7ef81c547163c623e97996d0e7e6c9f6196cb7878c6d04c1a6b47cb006e8a
3
- size 4785009600
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7dc8ca686e3a721fb92d7eb8dd8e2ffb2632543d90cdc9f093da82cc934eb517
3
+ size 4785009472
Yi-Coder-9B-Chat.imatrix.dat CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:90f8ed0f6830f6deb8dbf36baa781cacfa01e4261ac9ac69c0373613fa7345e2
3
  size 6843280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3b191c30944617e38f152af64a72d26c0f5e2ec2999ae1df09ce54adf61264e
3
  size 6843280