brittlewis12 commited on
Commit
c1dc27a
1 Parent(s): e73e258

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: m-a-p/OpenCodeInterpreter-DS-6.7B
3
+ inference: false
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ license_link: https://github.com/OpenCodeInterpreter/OpenCodeInterpreter/blob/main/LICENSE
8
+ model_creator: m-a-p
9
+ model_name: OpenCodeInterpreter-DS-6.7B
10
+ model_type: llama
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - code
14
+ quantized_by: brittlewis12
15
+ ---
16
+
17
+ ![HumanEval-pass@1](https://opencodeinterpreter.github.io/static/images/figure1.png)
18
+
19
+ # OpenCodeInterpreter-DS-6.7B GGUF
20
+
21
+
22
+ **Original model**: [OpenCodeInterpreter-DS-6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B)
23
+
24
+ **Model creator**: [Multimodal Art Projection Research Community](https://huggingface.co/m-a-p)
25
+
26
+ This repo contains GGUF format model files for Multimodal Art Projection Research Community (M-A-P)’s OpenCodeInterpreter-DS-6.7B.
27
+
28
+ > The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.
29
+
30
+ Learn more on M-A-P’s [Model page](https://opencodeinterpreter.github.io/).
31
+
32
+ ### What is GGUF?
33
+
34
+ GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
35
+ Converted using llama.cpp build 2249 (revision [15499eb](https://github.com/ggerganov/llama.cpp/commit/15499eb94227401bdc8875da6eb85c15d37068f7))
36
+
37
+ ### Prompt template
38
+
39
+ ```
40
+ <|User|>
41
+ {{prompt}}
42
+
43
+ <|Assistant|>
44
+
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Download & run with [cnvrs](https://twitter.com/cnvrsai) on iPhone, iPad, and Mac!
50
+
51
+ ![cnvrs.ai](https://pbs.twimg.com/profile_images/1744049151241797632/0mIP-P9e_400x400.jpg)
52
+
53
+ [cnvrs](https://testflight.apple.com/join/sFWReS7K) is the best app for private, local AI on your device:
54
+ - create & save **Characters** with custom system prompts & temperature settings
55
+ - download and experiment with any **GGUF model** you can [find on HuggingFace](https://huggingface.co/models?library=gguf)!
56
+ - make it your own with custom **Theme colors**
57
+ - powered by Metal ⚡️ & [Llama.cpp](https://github.com/ggerganov/llama.cpp), with **haptics** during response streaming!
58
+ - **try it out** yourself today, on [Testflight](https://testflight.apple.com/join/sFWReS7K)!
59
+ - follow [cnvrs on twitter](https://twitter.com/cnvrsai) to stay up to date
60
+
61
+ ---
62
+
63
+ ## Original Model Evaluation
64
+
65
+ > The study leverages data from the EvalPlus leaderboard, examining OpenCodeInterpreter's performance against benchmarks such as GPT-3.5/4-Turbo, CodeLlama-Python, WizardCoder, Deepseek-Coder, and CodeT5+ across various scales on the HumanEval and MBPP benchmarks and their advanced versions. For multi-turn code generation, the focus shifts to assessing OpenCodeInterpreter's capability in iterative refinement through a two-round limit, considering execution feedback and human feedback scenarios. The experimental setup aims to highlight OpenCodeInterpreter's adaptability and proficiency in code generation, underscored by its achievements in setting new standards in software development tools through iterative feedback and refinement.
66
+
67
+ For more detail on evaluation process, see [main results](https://opencodeinterpreter.github.io/#mainresults) & eval code [README](https://github.com/OpenCodeInterpreter/OpenCodeInterpreter/blob/f5bcecc42f84b4b789757daf3af476fcfa8b9d79/evaluation/README.md).
68
+
69
+ | Model | HumanEval (+) | MBPP (+) | Average (+) |
70
+ |----------------------|---------------|----------|-------------|
71
+ | OpenCodeInterpreter-DS-6.7B | 76.2 (72.0) | 73.9 (63.7) | 75.1 (67.9) |
72
+ | --> with Execution Feedback | 81.1 (78.7) | 82.7 (72.4) | 81.9 (75.6) |
73
+ | --> with Synth. Human Feedback | 87.2 (**86.6**) | 86.2 (74.2) | 86.7 (80.4) |
74
+ | --> with Synth. Human Feedback (Oracle) | **89.7 (86.6)** | **87.2 (75.2)** | **88.5 (80.9)** |
75
+ | — | — | — | — |
76
+ | GPT-4-Turbo | 85.4 (81.7) | 83.0 (70.7) | 84.2 (76.2) |
77
+ | --> with Execution Feedback | **88.0 (84.2)** | **92.0 (78.2)** | **90.0 (81.2)** |
78
+ | — | — | — | — |
79
+ | GPT-3.5-Turbo | 72.6 (65.9) | 81.7 (69.4) | 77.2 (67.7) |
80
+ | --> with Execution Feedback | 76.8 (70.7) | 87.0 (73.9) | 81.9 (72.3) |