File size: 4,762 Bytes
f23c0bb
7918f51
 
 
 
aefb7a2
 
e845ffb
 
f23c0bb
 
0c9feee
f23c0bb
aefb7a2
 
b2cf22d
0c9feee
 
 
 
d73f5bc
0c9feee
 
 
 
 
 
 
 
 
 
d73f5bc
 
0c9feee
 
 
 
 
 
 
 
 
 
 
 
 
488c03b
0c9feee
 
 
 
 
 
7b95f30
0c9feee
 
67443f1
8fb3337
6d9d2d5
0c9feee
 
ed7a36d
0c9feee
66bc687
 
 
 
c8dcb04
0c9feee
75716a4
 
 
0c9feee
 
 
e9be410
 
 
 
0c9feee
 
 
 
 
 
 
 
 
 
5f58aad
0c9feee
 
aefb7a2
d73f5bc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
base_model: elinas/chronos-33b
tags:
- roleplay
- storywriting
- llama1
- finetune
- transformers
- pytorch
---

# Zeus Labs ~ Chronos-Divergence-33B

![image/png](https://cdn-uploads.huggingface.co/production/uploads/630417380907b9a115c6aa9f/tJ3zBlUS83BzKx0G0VutU.png)

The original model, LLaMA 1 was pre-trained at a sequence length of 2048 tokens. We went through two individual runs, targeting a sequence length of 16,384 which is a
significant increase over the original length. While it was originally pre-trained on 1.4T tokens, it was shown to respond positively to our 500M token train and will
coherently write and keep the same writing format (granted some caveats) up to 12K tokens relatively consistently.

Chronos-Divergence-33B is a one of a kind model which is based on the original [Chronos-33B](https://huggingface.co/elinas/chronos-33b) and now focuses on prompt adherence for *roleplay* and storywriting. 
It was trained at 16,834 tokens and can go up to around 12,000 tokens before any deterioration without the use of RoPE or other model extending techniques.

**The unique aspect of this model is that is has little to no "GPT-isms" or commonly referred to "slop" which are repetitive phrases many modern LLMs
output due to their pre-training and finetuning datasets. We completely cleaned our datasets and relied on the original "charm" of the L1 series and might bring this
to more of the smaller models if this gains traction. It also avoids ["purple prose"](https://en.wikipedia.org/wiki/Purple_prose) in the same way.**

RoPE or RULER has not been tested as we are satisfied with our results, we will also run evaluations, but are not expecting much from a dated model, focused on RP intelligence. 

Next steps would be to implement GQA (Grouped Query Attention) to as the number of tokens you input increases, so will memory usage, and this technique has been shown to reduce
memory burden. This will require significant effort on our part (help welcome!) and we hope that quantizations will be sufficient in the meantime.

The datasets used do not have a planned release date, though it is less the data and more the technique that was able to make this "dated" model very special and unlike many of us 
have experienced before due to the modernization added to the model without the common phrases GPTs like to output today, though making it uncensored as a result.

Without spoiling anything, the name of the model and presented character have meaning... Look up Steins;Gate if you are not familiar :)

## Instruct Template

This model uses `ChatML` - below is an example. It is a preset in many frontends.

```
<|im_start|>system
A system prompt describing how you'd like your bot to act.<|im_end|>
<|im_start|>user
Hello there!<|im_end|>
<|im_start|>assistant
I can assist you or we can discuss other things?<|im_end|>
<|im_start|>user
I was wondering how transformers work?<|im_end|>
<|im_start|>assistant
```

## Quantization
Please note that we tested this model in BF16/FP16 and 8bit. Results are not expected to be the same when going below this quanitzation.

#### LlamaCPP
[@bartowski](https://huggingface.co/bartowski/Chronos-Divergence-33B-GGUF)

[@mradermacher](https://huggingface.co/mradermacher/Chronos-Divergence-33B-i1-GGUF)

#### Exllama2
[@elinas - 8.0bpw](https://huggingface.co/ZeusLabs/Chronos-Divergence-33B-exl2-8.0bpw)

[@SicariusSicariiStuff - 6.0bpw](https://huggingface.co/SicariusSicariiStuff/ZeusLabs_Chronos-Divergence-33B-EXL2-6.0bpw)

[@SicariusSicariiStuff - 4.0bpw](https://huggingface.co/SicariusSicariiStuff/ZeusLabs_Chronos-Divergence-33B-EXL2-4.0bpw)

[More quants avaliable here](https://huggingface.co/collections/SicariusSicariiStuff/zeuslabs-chronos-divergence-33b-exl2-quants-66e218145b1fc436d9e56d6f)

#### FP8
[@SicariusSicariiStuff](https://huggingface.co/SicariusSicariiStuff/ZeusLabs_Chronos-Divergence-33B_FP8)

## Sampling Settings
Here are some settings that work well with this model:
```
Temp -> 0.7 (1.0 max)
Min P -> 0.05-0.10
Presence Penalty -> 1.0
Repetition Penalty range -> 2800
```

## Credit
Thank you to my team consisting of [@Fizzarolli](https://huggingface.co/Fizzarolli) and [@ToastyPigeon](https://huggingface.co/ToastyPigeon) and myself [@elinas](https://huggingface.co/elinas).

Fizz graciously provided compute for us to run this (dumb), but fun experiment on, while Toasty assisted in dataset preperation! I ran the MLOps in the meantime.


## Additional Details 

Please be mindful of the license. This is strictly non-commercial by Meta LLaMA terms, but free to use at your own leisure personally. 

If you have any questions or concerns, please post in the community tab.

DISCLAIMER: Outputs generated by the model are not reflective of our views.