KrisPi
/

Wizard-Coder-0.66-Redmond-Hermes-0.33-ct2fast

Transformers

Inference Endpoints

Model card Files Files and versions Community

KrisPi commited on Aug 18, 2023

Commit

c888f6b

•

1 Parent(s): 2854d1d

Update README.md

Browse files

Files changed (1) hide show

README.md +135 -0

README.md CHANGED Viewed

@@ -1,3 +1,138 @@
 ---
 license: openrail
 ---

 ---
 license: openrail
 ---
+**This model is merge between 66% of Wizard Coder and 33% of Redmond Hermes Coder (which is Wizard Coder fine-tune):**
+https://huggingface.co/NousResearch/Redmond-Hermes-Coder
+https://huggingface.co/WizardLM/WizardCoder-15B-V1.0
+Merger done by the most basic value average.
+Using CTranslate2 for quantization and inference achieving as much as 37 tokens /s on RTX 3090 GPU.
+Inference done by using text-generation-webui:
+Added this code and ran update on requirements.txt: https://github.com/oobabooga/text-generation-webui/pull/2828
+There is one thing extra to be changed in the code: reply = apply_extensions('output', reply) to: reply = apply_extensions('output', reply, state)
+The idea was to get some of the coding abilities back that were lost in fine-tune, but retain at least basic capabilities to summarize text and work with context. This experiment was also focused on using CT2 for its speed. **I believe presented approach is the best available compromise between speed, coding accuracy and a little of general LLM use. ~Please note that CT2 8bit quant seems to have better HumanEval scores than load-in-8bit~**
+Community now mostly focus on making non-coding models - code as making coding models be more general seems near impossible.
+However, my daily use is focused around DevOps questions, summarizing content and script development. Further development will be around intent analysis for integration with TODO lists and calendar extracting actions and notes from my voice transcription. This model doesn't seem to work well enough on those tasks so next time will attempt actual fine-tunes of Wizard Coder or just running two models at the same time. I hope to fit under 24GB VRAM which would mean I will also evaluate 4 bit quantization.
+My initial testing was checking if model finds:
+Overflow: `"what is mistake in following C++ code: int a = 1e9+7; int b = 1e9+9; int c = a*b; cout << c;"`
+Out of bounds: `"what is bug in the following C++ code: int a = 100; vector <int> b(a); b[a] = 20; cout << b[a] << '\n';"`
+and propose using "docker update" for `"how to stop docker container so it doesnt start every reboot"`
+I have run those prompts in the loop, with different presets and ended up picking this preset:
+`['temperature'] = 1.31`
+`['top_p'] = 0.29`
+`['top_k'] = 72`
+`['repetition_penalty'] = 1.09`
+Testing of the above prompts has shown that Hermes Coder CT2 was not able to answer correctly most of the time while Wizard Coder and this merge did. Merged model seems to retain ability to use "### Input:" in the prompt and became more sensitive to non-coding instruction. (Wizard Coder almost completely disregard it)
+In the bottom you can see EvalPlus benchmarks of three mentioned models - seems they all performed in similar way with default preset. I'm not sure if I'm not doing benchmark right or those quants are not working properly. As I noticed myself custom preset improved the result. I will rerun benchmarks as the last result seems to hint some randomness.
+**For summarization I propose following prompt:**
+`Below is an instruction that describes a task. Write a response that appropriately completes the request.`
+`### Instruction:`
+`Please provide a concise, summary for each topic presented in the input below. Ensure clarity, coherence, and avoid redundant information.`
+`### Input:`
+`[CONTENT TO SUMMARIZE]`
+`### Response:The summary for each topic presented in the input is as follows:`
+**Optionally iterate over the output with following prompt:**
+`Below is an instruction that describes a task. Write a response that appropriately completes the request.`
+`### Instruction:`
+`Rewrite summary from Input. Fix typos, add missing spaces. Ensure clarity, coherence, and remove redundant information.`
+`### Input:`
+`[OUTPUT FROM PREVIOUS PROMPT]`
+`### Response:`
+**HumanEval** run using: https://github.com/my-other-github-account/llm-humaneval-benchmarks/
+and
+`sudo docker run -v $(pwd):/app ganler/evalplus:latest --dataset humaneval --samples results/{model_name}.jsonl`
+**Custom preset:**
+`['temperature'] = 1.31`
+`['top_p'] = 0.29`
+`['top_k'] = 72`
+`['repetition_penalty'] = 1.09`
+**CT2 int8_float16 merge, custom preset:**
+`Base`
+`{'pass@1': 0.47560975609756095}`
+`Base + Extra`
+`{'pass@1': 0.45121951219512196}`
+**CT2 int8_float16 Wizard Coder:**
+`Base`
+`{'pass@1': 0.43902439024390244}`
+`Base + Extra`
+`{'pass@1': 0.3597560975609756}`
+**Retry:**
+`Base`
+`{'pass@1': 0.42073170731707316}`
+`Base + Extra`
+`{'pass@1': 0.3475609756097561}`
+**Full-weight Wizard Coder loaded with --load-in-8bit, custom preset:**
+`Base`
+`{'pass@1': 0.3475609756097561}`
+`Base + Extra`
+`{'pass@1': 0.3170731707317073}`
+---
+**Default llm-humaneval-benchmarks preset:**
+`['temperature'] = 1`
+`['top_p'] = 1`
+`['top_k'] = 0`
+`['repetition_penalty'] = 1`
+**CT2 int8_float16 - this model:**
+`Base`
+`{'pass@1': 0.4634146341463415}`
+`Base + Extra`
+`{'pass@1': 0.4024390243902439}`
+**CT2 int8_float16 Redmond Hermes Coder:**
+`Base`
+`{'pass@1': 0.4695121951219512}`
+`Base + Extra`
+`{'pass@1': 0.4146341463414634}`
+**CT2 int8_float16 Wizard Coder:**
+`Base`
+`{'pass@1': 0.4695121951219512}`
+`Base + Extra`
+`{'pass@1': 0.3902439024390244}`
+**Full-weight Wizard Coder loaded with --load-in-8bit, default preset:**
+`Base`
+`{'pass@1': 0.43902439024390244}`
+`Base + Extra`
+`{'pass@1': 0.3719512195121951}`
+--------------