Quantization made by Richard Erkhov.
[Github](https://github.com/RichardErkhov)
[Discord](https://discord.gg/pvy7H8DZMG)
[Request more models](https://github.com/RichardErkhov/quant_request)
Refact-1_6B-fim - GGUF
- Model creator: https://huggingface.co/smallcloudai/
- Original model: https://huggingface.co/smallcloudai/Refact-1_6B-fim/
| Name | Quant method | Size |
| ---- | ---- | ---- |
| [Refact-1_6B-fim.Q2_K.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q2_K.gguf) | Q2_K | 0.58GB |
| [Refact-1_6B-fim.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.IQ3_XS.gguf) | IQ3_XS | 0.64GB |
| [Refact-1_6B-fim.IQ3_S.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.IQ3_S.gguf) | IQ3_S | 0.67GB |
| [Refact-1_6B-fim.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q3_K_S.gguf) | Q3_K_S | 0.67GB |
| [Refact-1_6B-fim.IQ3_M.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.IQ3_M.gguf) | IQ3_M | 0.7GB |
| [Refact-1_6B-fim.Q3_K.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q3_K.gguf) | Q3_K | 0.74GB |
| [Refact-1_6B-fim.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q3_K_M.gguf) | Q3_K_M | 0.74GB |
| [Refact-1_6B-fim.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q3_K_L.gguf) | Q3_K_L | 0.8GB |
| [Refact-1_6B-fim.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.IQ4_XS.gguf) | IQ4_XS | 0.82GB |
| [Refact-1_6B-fim.Q4_0.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q4_0.gguf) | Q4_0 | 0.86GB |
| [Refact-1_6B-fim.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.IQ4_NL.gguf) | IQ4_NL | 0.86GB |
| [Refact-1_6B-fim.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q4_K_S.gguf) | Q4_K_S | 0.86GB |
| [Refact-1_6B-fim.Q4_K.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q4_K.gguf) | Q4_K | 0.9GB |
| [Refact-1_6B-fim.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q4_K_M.gguf) | Q4_K_M | 0.9GB |
| [Refact-1_6B-fim.Q4_1.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q4_1.gguf) | Q4_1 | 0.94GB |
| [Refact-1_6B-fim.Q5_0.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q5_0.gguf) | Q5_0 | 1.03GB |
| [Refact-1_6B-fim.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q5_K_S.gguf) | Q5_K_S | 1.03GB |
| [Refact-1_6B-fim.Q5_K.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q5_K.gguf) | Q5_K | 1.05GB |
| [Refact-1_6B-fim.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q5_K_M.gguf) | Q5_K_M | 1.05GB |
| [Refact-1_6B-fim.Q5_1.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q5_1.gguf) | Q5_1 | 1.12GB |
| [Refact-1_6B-fim.Q6_K.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q6_K.gguf) | Q6_K | 1.21GB |
| [Refact-1_6B-fim.Q8_0.gguf](https://huggingface.co/RichardErkhov/smallcloudai_-_Refact-1_6B-fim-gguf/blob/main/Refact-1_6B-fim.Q8_0.gguf) | Q8_0 | 1.57GB |
Original model description:
---
pipeline_tag: text-generation
inference: true
widget:
- text: 'def print_hello_world():'
example_title: Hello world
group: Python
license: bigscience-openrail-m
pretrain-datasets:
- books
- arxiv
- c4
- falcon-refinedweb
- wiki
- github-issues
- stack_markdown
- self-made dataset of permissive github code
datasets:
- bigcode/the-stack-dedup
- rombodawg/2XUNCENSORED_MegaCodeTraining188k
- bigcode/commitpackft
metrics:
- code_eval
library_name: transformers
tags:
- code
model-index:
- name: Refact-1.6B
results:
- task:
type: text-generation
dataset:
type: openai_humaneval
name: HumanEval
metrics:
- name: pass@1 (T=0.01)
type: pass@1
value: 32.0
verified: false
- name: pass@1 (T=0.2)
type: pass@1
value: 31.5
verified: false
- name: pass@10 (T=0.8)
type: pass@10
value: 53.0
verified: false
- name: pass@100 (T=0.8)
type: pass@100
value: 76.9
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Python
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 35.8
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize JavaScript
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 31.6
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Java
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 29.1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Go
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize C++
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 26.3
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Rust
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalSynthesize Average
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests Python
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 18.38
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests JavaScript
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 12.28
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests Java
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 15.12
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests Go
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests C++
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 13.17
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests Rust
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 2.8
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixTests Average
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs Python
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 26.92
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs JavaScript
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 26.85
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs Java
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 30.76
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs Go
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs C++
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 25.94
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs Rust
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 8.44
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalFixDocs Average
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain Python
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 26.46
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain JavaScript
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 17.86
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain Java
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 20.94
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain Go
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain C++
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 18.78
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain Rust
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: bigcode/humanevalpack
name: HumanEvalExplain Average
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: -1
verified: false
- task:
type: text-generation
dataset:
type: mbpp
name: MBPP
metrics:
- name: pass@1 (T=0.01)
type: pass@1
value: 31.15
verified: false
- task:
type: text-generation
dataset:
type: ds1000
name: DS-1000 (Overall Completion)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 10.1
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (C++)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 21.61
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (C#)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 13.91
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (D)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 9.5
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Go)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 53.57
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Java)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 21.58
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Julia)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 13.75
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (JavaScript)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 26.88
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Lua)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 15.26
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (PHP)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 23.04
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Perl)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 12.1
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Python)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 29.6
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (R)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 13.77
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Ruby)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 12.68
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Racket)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 4.29
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Rust)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 19.54
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Scala)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 18.33
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Bash)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 5.7
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (Swift)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 17.68
verified: false
- task:
type: text-generation
dataset:
type: nuprl/MultiPL-E
name: MultiPL-HumanEval (TypeScript)
metrics:
- name: pass@1 (T=0.2)
type: pass@1
value: 25
verified: false
language:
- en
---

# Refact-1.6B
Finally, the model we started training with our [blog post](https://refact.ai/blog/2023/applying-recent-innovations-to-train-model/) is ready 🎉
After fine-tuning on generated data, it beats Replit 3b, Stability Code 3b and many other models. It almost beats
StarCoder ten times the size!
Model | Size | HumanEval pass@1 | HumanEval pass@10 |
----------------------|---------------|--------------------|--------------------|
DeciCoder-1b | 1b | 19.1% | |
Refact-1.6-fim | 1.6b | 32.0% | 53.0% |
StableCode | 3b | 20.2% | 33.8% |
ReplitCode v1 | 3b | 21.9% | |
CodeGen2.5-multi | 7b | 28.4% | 47.5% |
CodeLlama | 7b | 33.5% | 59.6% |
StarCoder | 15b | 33.6% | |
Likely, it's the best model for practical use in your IDE for code completion because it's smart and fast!
You can start using it right now by downloading the
[Refact plugin](https://refact.ai/). You can host the model yourself, too, using the
[open source docker container](https://github.com/smallcloudai/refact).
And it's multi-language (see MultiPL-HumanEval and other metrics below) and it works as a chat (see the section below).
# It Works As a Chat
The primary application of this model is code completion (infill) in multiple programming languages.
But it works as a chat quite well.
HumanEval results using instruction following (chat) format, against models specialized for chat only:
Model | Size | pass@1 | pass@10 |
-----------------------|--------|----------|----------|
Refact-1.6-fim | 1.6b | 38.4% | 55.6% |
StableCode-instruct | 3b | 26.9% | 36.2% |
OctoGeeX | 6b | 44.7% | |
CodeLlama-instruct | 7b | 34.8% | 64.3% |
CodeGen2.5-instruct | 7b | 36.2% | 60.87 |
CodeLlama-instruct | 13b | 42.7% | 71.6% |
StarChat-β | 15b | 33.5% | |
OctoCoder | 15b | 46.2% | |
# Example
Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
```python
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "smallcloudai/Refact-1_6B-fim"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)
prompt = 'def print_hello_world():\n """\n print("Hello world!")'
inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs, max_length=100, temperature=0.2)
print("-"*80)
print(tokenizer.decode(outputs[0]))
```
# Chat Format
The same model works as chat (experimental).
```python
prompt_template = "SYSTEM {system}\n" \
"USER {query}\n" \
"ASSISTANT"
prompt = prompt_template.format(system="You are a programming assistant",
query="How do I sort a list in Python?")
```
# Architecture
As described in more detail in the blog post, we used:
- [ALiBi](https://arxiv.org/abs/2108.12409) based attention
- [LayerNorm](https://arxiv.org/abs/1607.06450v1) instead of [RMSNorm](https://arxiv.org/pdf/1910.07467.pdf)
- [Multi Query Attention](https://arxiv.org/abs/1911.02150)
We also used LiON, flash attention, early dropout. It's not that innovative that you can't run it, in fact you can -- see an example below.
# Pretraining
For the base model, we used our own dataset that contains code with permissive licenses only, and open text datasets.
Filtering is the key to success of this model:
- We only used text in English
- Only topics related to computer science
- Applied heavy deduplication
The text to code proportion was 50:50, model trained for 1.2T tokens.
We don't release the base model, because its Fill-in-the-Middle (FIM) capability likes to repeat itself too much, so
its practical use is limited. But if you still want it, write us a message on Discord.
# Finetuning
We tested our hypothesis that chat data should boost base model performance in FIM and
regular left-to-right code completion. We found that just 15% of open
[code](https://huggingface.co/datasets/bigcode/commitpackft)
[instruction-following](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k) datasets,
that we filtered for quality, improves almost all metrics.
Additionally, to improve FIM, we observed common failure modes, and prepared a synthetic dataset based on
[The Stack dedup v1.1](https://huggingface.co/datasets/bigcode/the-stack-dedup) to address them.
There is a distribution shift between typical code on the internet, and the code you write in your IDE.
The former is likely finished, so the model tries to come up with a suggestion that makes the code complete.
You are likely to have half-written code as you work on it, there is no single addition that can repair it
fully.
In practice, model needs to have a tendency to stop after a couple of lines are added, and sometimes don't write
anything at all. We found that just giving it empty completions, single line completions, multiline
completions that end with a smaller text indent or at least a newline -- makes it much more usable. This data
was used as the rest 85% of the finetune dataset.
The final model is the result of several attempts to make it work as good as possible for code completion,
and to perform well on a wide range of metrics. The best attempt took 40B tokens.
# Limitations and Bias
The Refact-1.6B model was trained on text in English. But it has seen a lot more languages in
code comments. Its performance on non-English languages is lower, for sure.
# Model Stats
- **Architecture:** LLAMA-like model with multi-query attention
- **Objectives** Fill-in-the-Middle, Chat
- **Tokens context:** 4096
- **Pretraining tokens:** 1.2T
- **Finetuning tokens:** 40B
- **Precision:** bfloat16
- **GPUs** 64 NVidia A5000
- **Training time** 28 days
# License
The model is licensed under the BigScience OpenRAIL-M v1 license agreement
# Citation
If you are using this model, please give a link to this page.