Requantized with fixed imatrix

Browse files

Files changed (14) hide show

README.md +3 -34
Yi-Coder-9B-Chat-bf16-00001-of-00002.gguf +2 -2
Yi-Coder-9B-Chat.IQ1_M.gguf +2 -2
Yi-Coder-9B-Chat.IQ1_S.gguf +2 -2
Yi-Coder-9B-Chat.IQ2_M.gguf +2 -2
Yi-Coder-9B-Chat.IQ2_S.gguf +2 -2
Yi-Coder-9B-Chat.IQ2_XS.gguf +2 -2
Yi-Coder-9B-Chat.IQ2_XXS.gguf +2 -2
Yi-Coder-9B-Chat.IQ3_M.gguf +2 -2
Yi-Coder-9B-Chat.IQ3_S.gguf +2 -2
Yi-Coder-9B-Chat.IQ3_XS.gguf +2 -2
Yi-Coder-9B-Chat.IQ3_XXS.gguf +2 -2
Yi-Coder-9B-Chat.IQ4_XS.gguf +2 -2
Yi-Coder-9B-Chat.imatrix.dat +1 -1

README.md CHANGED Viewed

@@ -1,5 +1,6 @@
 ---
 license: apache-2.0
 tags:
 - code
 language:
@@ -24,12 +25,12 @@ This repo contains State Of The Art quantized GGUF format model files for [Yi-Co
 Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
 **Update September 5th**: Marked <|im_start|> as a special token, fixing tokenization.
 Corrected EOS (<|im_end|>) and added EOT (<|endoftext|>) token to prevent infinite responses (am I the only one actually dog-fooding my own quants?).
-Fill-in-Middle token metadata has been added, see [example](#simple-llama-cpp-python-example-fill-in-middle-code). NOTE: Yi's FIM requires support for [SPM infill mode](https://github.com/abetlen/llama-cpp-python/pull/1492)! However it seems it has not been extensively trained for this (perhaps not at all), so don't expect particularly great results...
 <!-- description end -->
@@ -177,38 +178,6 @@ print(llm.create_chat_completion(
 ))
 ```
-#### Simple llama-cpp-python example fill-in-middle code
-```python
-from llama_cpp import Llama
-# Completion API
-prompt = "def add("
-suffix = "\n    return sum\n\n"
-llm = Llama(model_path="./Yi-Coder-9B-Chat.IQ4_XS.gguf", n_gpu_layers=49, n_ctx=131072, spm_infill=True)
-output = llm.create_completion(
-    temperature = 0.0,
-    repeat_penalty = 1.0,
-    prompt = prompt,
-    suffix = suffix
-)
-# Models sometimes repeat suffix in response, attempt to filter that
-response = output["choices"][0]["text"]
-response_stripped = response.rstrip()
-unwanted_response_suffix = suffix.rstrip()
-unwanted_response_length = len(unwanted_response_suffix)
-filtered = False
-if unwanted_response_suffix and response_stripped[-unwanted_response_length:] == unwanted_response_suffix:
-    response = response_stripped[:-unwanted_response_length]
-    filtered = True
-print(f"Fill-in-Middle completion{' (filtered)' if filtered else ''}:\n\n{prompt}\033[32m{response}\033[{'33' if filtered else '0'}m{suffix}\033[0m")
-```
 <!-- README_GGUF.md-how-to-run end -->
 <!-- original-model-card start -->

 ---
 license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - code
 language:
 Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
+**Update September 19th**: Requantized with new imatrix after finding a [bug](https://github.com/ggerganov/llama.cpp/pull/9543) in `llama-imatrix` that degraded the data set. Also removed the Fill-in-Middle tokens as they are [not properly supported](https://huggingface.co/01-ai/Yi-Coder-9B-Chat/discussions/5).
 **Update September 5th**: Marked <|im_start|> as a special token, fixing tokenization.
 Corrected EOS (<|im_end|>) and added EOT (<|endoftext|>) token to prevent infinite responses (am I the only one actually dog-fooding my own quants?).
 <!-- description end -->
 ))
 ```
 <!-- README_GGUF.md-how-to-run end -->
 <!-- original-model-card start -->

Yi-Coder-9B-Chat-bf16-00001-of-00002.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c1f9006b2b3fabab9f796b643fc488ee6c4ead7522768c76e69d2761ff2ee3c3
-size 1478133

 version https://git-lfs.github.com/spec/v1
+oid sha256:2cb211e2ee5e0276772bb952baa723c624578b8b16eb1531243ce5cae4713976
+size 1477995

Yi-Coder-9B-Chat.IQ1_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6ac4ada21f1e92506dfa0c63d4b77d488ebed479c6d8e21ca1ed767101cb37ee
-size 2181641152

 version https://git-lfs.github.com/spec/v1
+oid sha256:b1a7af2e7b3fc3e85e811a09a1701ea1f6cc9f025af04d7629df97c24a70bf16
+size 2181641024

Yi-Coder-9B-Chat.IQ1_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:36b3f5ce2f04d458a72c98819569873ac08ad28017034868050b368070207e80
-size 2014573504

 version https://git-lfs.github.com/spec/v1
+oid sha256:5180f9e3d872d3db1b7caff4f35e6d7973dc5dd8596ad84522e9056df5927a37
+size 2014573376

Yi-Coder-9B-Chat.IQ2_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:12203a97c90c2464a78fe97aca2d8f9b371ed08d374da72456f33de0d3635dda
-size 3098112960

 version https://git-lfs.github.com/spec/v1
+oid sha256:f5515525a85fe96ffacb3e9133a72375156c0be063801e2c1db3c001806e681e
+size 3098112832

Yi-Coder-9B-Chat.IQ2_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1e8d3345865f550163495d1c1296d0f984bcafab82042f4bb0b4cf5c4df908dc
-size 2875356096

 version https://git-lfs.github.com/spec/v1
+oid sha256:82f1a4a9d69180978e90de03a0c7b2682cb026383d723a8b043924d7c002171e
+size 2875355968

Yi-Coder-9B-Chat.IQ2_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:14f905b46f88737326eeecb13441f471e01d58b972eab1ece759186017339c52
-size 2708009920

 version https://git-lfs.github.com/spec/v1
+oid sha256:e6124847223b973dbb776b467e35ce7121f532b2301327abc66a55b1ab768452
+size 2708009792

Yi-Coder-9B-Chat.IQ2_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f8b1337522073c78a4b90692912b61930d2c5f0bcc6670f5d95c2a209ce79c2e
-size 2460087232

 version https://git-lfs.github.com/spec/v1
+oid sha256:5508f3e75294d78cd6b7cf7cca5fbb2a4a86ec264f7a20cffb6a91cebc23b1ad
+size 2460087104

Yi-Coder-9B-Chat.IQ3_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:acf6c496ce02c15ed057f7f3309d6c6ced33268c89dc6b9affbe961f80ec8a4b
-size 4055462848

 version https://git-lfs.github.com/spec/v1
+oid sha256:7df755d7533e67a5abc51eb666bbf8262317eebcd21b31ef772576d859260332
+size 4055462720

Yi-Coder-9B-Chat.IQ3_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9e7b5d91cbc6672ef853eac657bd0b5878c935bdf4a9b4d0ff1fe9ed515a283e
-size 3912577984

 version https://git-lfs.github.com/spec/v1
+oid sha256:5435e44b532b27c33567f07187f516f788fee8dc90439ea56c7599547cc25aab
+size 3912577856

Yi-Coder-9B-Chat.IQ3_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7a3d2ae4e215f3f316c5fc9b93bc795657d87df542dddbd98299a1fe2f6b2e6c
-size 3717936064

 version https://git-lfs.github.com/spec/v1
+oid sha256:f6e4fbe44ee79d4be26a70169578260d223a145e2a152edcc653d9af2a2ca34d
+size 3717935936

Yi-Coder-9B-Chat.IQ3_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f4017cf9b95ec59351acd8ba840af6dc30dfff43d6212b956916f94f8b8dc271
-size 3474322368

 version https://git-lfs.github.com/spec/v1
+oid sha256:c25f9a7bec86e399e01d7af69509d63c55e2bb4919b9991258a95b2aa1587753
+size 3474322240

Yi-Coder-9B-Chat.IQ4_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:57a7ef81c547163c623e97996d0e7e6c9f6196cb7878c6d04c1a6b47cb006e8a
-size 4785009600

 version https://git-lfs.github.com/spec/v1
+oid sha256:7dc8ca686e3a721fb92d7eb8dd8e2ffb2632543d90cdc9f093da82cc934eb517
+size 4785009472

Yi-Coder-9B-Chat.imatrix.dat CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:90f8ed0f6830f6deb8dbf36baa781cacfa01e4261ac9ac69c0373613fa7345e2
 size 6843280

 version https://git-lfs.github.com/spec/v1
+oid sha256:a3b191c30944617e38f152af64a72d26c0f5e2ec2999ae1df09ce54adf61264e
 size 6843280