KrisPi
/

Wizard-Coder-0.66-Redmond-Hermes-0.33-ct2fast

Transformers

Inference Endpoints

Model card Files Files and versions Community

KrisPi commited on Aug 18, 2023

Commit

305c966

•

1 Parent(s): c888f6b

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -10

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: openrail
 ---
-**This model is merge between 66% of Wizard Coder and 33% of Redmond Hermes Coder (which is Wizard Coder fine-tune):**
 https://huggingface.co/NousResearch/Redmond-Hermes-Coder
 https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
@@ -10,18 +10,23 @@ Merger done by the most basic value average.
 Using CTranslate2 for quantization and inference achieving as much as 37 tokens /s on RTX 3090 GPU.
-Inference done by using text-generation-webui:
-Added this code and ran update on requirements.txt: https://github.com/oobabooga/text-generation-webui/pull/2828
 There is one thing extra to be changed in the code: reply = apply_extensions('output', reply) to: reply = apply_extensions('output', reply, state)
-The idea was to get some of the coding abilities back that were lost in fine-tune, but retain at least basic capabilities to summarize text and work with context. This experiment was also focused on using CT2 for its speed. **I believe presented approach is the best available compromise between speed, coding accuracy and a little of general LLM use. ~Please note that CT2 8bit quant seems to have better HumanEval scores than load-in-8bit~**
-Community now mostly focus on making non-coding models - code as making coding models be more general seems near impossible.
-However, my daily use is focused around DevOps questions, summarizing content and script development. Further development will be around intent analysis for integration with TODO lists and calendar extracting actions and notes from my voice transcription. This model doesn't seem to work well enough on those tasks so next time will attempt actual fine-tunes of Wizard Coder or just running two models at the same time. I hope to fit under 24GB VRAM which would mean I will also evaluate 4 bit quantization.
-My initial testing was checking if model finds:
 Overflow: `"what is mistake in following C++ code: int a = 1e9+7; int b = 1e9+9; int c = a*b; cout << c;"`
@@ -30,15 +35,26 @@ Out of bounds: `"what is bug in the following C++ code: int a = 100; vector <int
 and propose using "docker update" for `"how to stop docker container so it doesnt start every reboot"`
 I have run those prompts in the loop, with different presets and ended up picking this preset:
 `['temperature'] = 1.31`
 `['top_p'] = 0.29`
 `['top_k'] = 72`
 `['repetition_penalty'] = 1.09`
-Testing of the above prompts has shown that Hermes Coder CT2 was not able to answer correctly most of the time while Wizard Coder and this merge did. Merged model seems to retain ability to use "### Input:" in the prompt and became more sensitive to non-coding instruction. (Wizard Coder almost completely disregard it)
-In the bottom you can see EvalPlus benchmarks of three mentioned models - seems they all performed in similar way with default preset. I'm not sure if I'm not doing benchmark right or those quants are not working properly. As I noticed myself custom preset improved the result. I will rerun benchmarks as the last result seems to hint some randomness.
 **For summarization I propose following prompt:**
@@ -131,7 +147,7 @@ and
 `{'pass@1': 0.3719512195121951}`
 --------------

 ---
 license: openrail
 ---
+**This model is a merge between 66% of Wizard Coder and 33% of Redmond Hermes Coder (which is Wizard Coder fine-tune):**
 https://huggingface.co/NousResearch/Redmond-Hermes-Coder
 https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
 Using CTranslate2 for quantization and inference achieving as much as 37 tokens /s on RTX 3090 GPU.
+Inference is done by using text-generation-webui:
+Added this code and ran an update on requirements.txt: https://github.com/oobabooga/text-generation-webui/pull/2828
 There is one thing extra to be changed in the code: reply = apply_extensions('output', reply) to: reply = apply_extensions('output', reply, state)
+The idea was to get some of the coding abilities back that were lost in fine-tune but retain at least basic capabilities to summarize text and work with context. This experiment was also focused on using CT2 for its speed.
+**I believe the presented approach is the best available compromise between speed, coding accuracy, and a little of general LLM use.**
+**Please note that CT2 8bit quant seems to have better HumanEval scores than load-in-8bit**
+The community now mostly focuses on making non-coding models - code as making coding models be more general seems near impossible.
+However, my daily use is focused on DevOps questions, summarizing content, and script development. Further development will be around intent analysis for integration with TODO lists and calendar extracting actions and notes from my voice transcription. This model doesn't seem to work well enough on those tasks so next time will attempt actual fine-tunes of Wizard Coder or just run two models at the same time. I hope to fit under 24GB VRAM which would mean I will also evaluate 4 bit quantization.
+My initial testing was checking if the model finds:
 Overflow: `"what is mistake in following C++ code: int a = 1e9+7; int b = 1e9+9; int c = a*b; cout << c;"`
 and propose using "docker update" for `"how to stop docker container so it doesnt start every reboot"`
 I have run those prompts in the loop, with different presets and ended up picking this preset:
 `['temperature'] = 1.31`
 `['top_p'] = 0.29`
 `['top_k'] = 72`
 `['repetition_penalty'] = 1.09`
+Testing of the above prompts has shown that Hermes Coder CT2 was not able to answer correctly most of the time while Wizard Coder and this merge did. The merged model seems to retain the ability to use "### Input:" in the prompt and became more sensitive to non-coding instruction. (Wizard Coder almost completely disregards it)
+In the bottom you can see EvalPlus benchmarks of three mentioned models - seems they all performed in a similar way with the default preset. I'm not sure if I'm not doing the benchmark right or if those quants are not working properly with default preset. As I noticed custom preset considerably improved the result.
+**I would greatly appreciate if anyone can confirm how good this model is with proposed preset as the result I got really positively suprised me.(seems better than any other Wizard Coder 8bit quant**
+**CT2 int8_float16 merge, custom preset:**
+`Base`
+`{'pass@1': 0.47560975609756095}`
+`Base + Extra`
+`{'pass@1': 0.45121951219512196}`
 **For summarization I propose following prompt:**
 `{'pass@1': 0.3719512195121951}`
 --------------