license: apache-2.0
datasets:
- hudsongouge/Open-Discord
language:
- en
pipeline_tag: text-generation
tags:
- casual
- chat
Chatter-70M
This repository contains the Chatter-70M model, a lightweight casual chat language model available in multiple formats:
- GGUF format (quantized and FP16 versions)
- SafeTensors format
Model Variants
model-Q4_K_M.gguf
: 4-bit quantized versionmodel-Q6_K.gguf
: 6-bit quantized versionmodel-Q8_0.gguf
: 8-bit quantized versionmodel-FP16.gguf
: 16-bit floating point versionmodel.safetensors
: Original model weights in SafeTensors format
Configuration Files
The repository includes several configuration files:
config.json
: Model configurationgeneration_config.json
: Generation parameters configurationtokenizer_config.json
: Tokenizer configurationspecial_tokens_map.json
: Special tokens mappingtokenizer.json
&tokenizer.model
: Tokenizer filestraining_args.bin
: Training arguments
Training
This model was trained 1 epoch on the Open Discord Dataset and a few other small non-public datasets for a total of 705.8mb of uncompressed text data from Discord. It trained for approximately 14 hours on my laptop (M3 Max, 30-core GPU) with a peak learning rate of 4e-4 with cosine decay and 100 warmup steps. I used a batch size of 16 and 8 gradient accumulation steps. It used an AdamW optimizer with betas 0.900 and 0.990. It was trained exlusively on samples 1024 tokens in length.
Architecture
It uses a Llama-3 architecture:
LlamaConfig(
hidden_size=512,
intermediate_size=1024,
num_attention_heads=8,
num_hidden_layers=16,
num_key_value_heads=4,
rms_norm_eps=1e-5,
vocab_size=32000,
max_position_embeddings=4096,
torch_dtype=bfloat16,
hidden_dropout_prob=0.2,
attention_probs_dropout_prob=0.2,
attention_bias=True,
)
It uses the Llama-2 7B tokenizer.
Chat Format
TBH, I didn't expect this model to do so well. So, the chat format is a little unconventional.
[username] at [timestamp]:
content
[username] at [timestamp]:
content
Example:
Hudson at 2025-03-07T20:47:54.374-05:00:
hi
Bot at 2025-03-07T20:48:17.635-05:00:
hi hudson
Hudson at 2025-03-07T20:48:17.635-05:00:
how are you?
Bot at 2025-03-07T20:48:27.635-05:00:
i am good at the moment
Recommended usage
Chatter 70M is coherent on its own, but it is very stupid. Because it was trained on discord data, it is also very sensitive to usernames. Using different usernames can produce different styles. For example, the username "Bot" will make it respond in an AI-assistant-like style. More casual discord usernames will cause it to respond more casually. I recommend playing around with this to find the best user and assistant usernames for your purpose. Also, I recommend using a low temperature (probably about 0.5) for coherent output. Here is a sample:
jameui at 2019-05-13T18:46:02.66-04:00:
@Deleted User the gov is a thing
themaka at 2019-05-13T18:46:09.257-04:00:
You could've got it here if you want to check and then you can't find the image.
Deleted User at 2019-05-13T18:46:21.716-04:00:
lmao
Deleted User at 2019-05-13T18:46:37.228-04:00:
they're not in the server
Deleted User at 2019-05-13T18:46:44.609-04:00:
wait what
jameui at 2019-05-13T18:46:47.762-04:00:
Lol
jameui at 2019-05-13T18:47:08.378-04:00:
I just said it was a joke
themaka at 2019-05-13T18:47:26.856-04:00:
You can, @Deleted User @Deleted User if you've got a job on this chat and then get the last one I will be banned from my server, not 1013
jameui at 2019-05-13T18:47:31.673-04:00:
lol
jameui at 2019-05-13T18:47:35.129-04:00:
lol
jameui at 2019-05-13T18:47:37.202-04:00:
oof
Deleted User at 2019-05-13T18:47:46.574-04:00:
wow
jameui at 2019-05-13T18:47:55.176-04:00:
I don't think so
themaka at 2019-05-13T18:47:56.823-04:00:
I'm not sure about that.
As you can see, it looks like a discord chat. However, it isn't very smart.
Because of that, I recommend averaging probabilities across this model and Llama2-7B. That produces much better results while still maintaining the casual language:
Hudson at 2025-03-09T08:02:57.857+05:00:
hi
NeverUsedDC at 2025-03-09T08:03:00.670+05:00:
hello
Hudson at 2025-03-09T08:03:03.128+05:00:
how are you?
NeverUsedDC at 2025-03-09T08:03:06.081+05:00:
good
Hudson at 2025-03-09T08:03:13.063+05:00:
Working on anything cool lately?
NeverUsedDC at 2025-03-09T08:03:16.373+05:00:
yeah, just finished a project for a client
Hudson at 2025-03-09T08:03:22.529+05:00:
what was the project?
NeverUsedDC at 2025-03-09T08:03:25.950+05:00:
it was a website for a new startup
Hudson at 2025-03-09T08:03:37.603+05:00:
oh, that's cool! What startup?
NeverUsedDC at 2025-03-09T08:04:47.741+05:00:
the startup is called "EcoClean" and they're focused on sustainable cleaning products
Hudson at 2025-03-09T08:05:09.089+05:00:
Cool! What's your favorite programming language btw?
NeverUsedDC at 2025-03-09T08:05:36.843+05:00:
i'm more of a backend dev, so i'd say python is my go-to language
Hudson at 2025-03-09T08:05:42.235+05:00:
How do you feel about Rust?
NeverUsedDC at 2025-03-09T08:05:46.480+05:00:
haven't used it much, but i've heard good things
Hudson at 2025-03-09T08:09:11.898+05:00:
btw, where do you live?
NeverUsedDC at 2025-03-09T08:11:04.133+05:00:
lol
Hudson at 2025-03-09T08:11:36.795+05:00:
So, you won't tell me?
NeverUsedDC at 2025-03-09T08:11:41.110+05:00:
Haha, nope
Hudson at 2025-03-09T08:12:19.944+05:00:
Okay, fine. I've been working on an AI project lately.
NeverUsedDC at 2025-03-09T08:12:24.337+05:00:
oh
Hudson at 2025-03-09T08:19:08.059+05:00:
what do you think of it?
NeverUsedDC at 2025-03-09T08:19:08.059+05:00:
it's cool
See? Best of both worlds. You can adjust the balance between them to get different results. Chatter 70M on its own generally prefers simple responses like "lol", "oh", "idk", and things like that. It can and will generate grammatically correct sentences but most of the time they have little real meaing or irrelevent. That is why Llama is helpful. Llama 2 is going to craft longer responses and responds more like an AI.
One thing you will notice is that Chatter has is often responding with incomplete thoughts, perhaps just a single word. This is because on Discord thoughts are often sent as multiple messages. I am working to add support on my app to properly represent multi-message responses.
Running
For best results, use my LLM Probability Visualizer Tool which provides native support for this model, its chat format, username customization, probability averaginb, and more.
Just download this model's GGUF, and clone that repo, then put in this model and Llama2 7B in the MODELS dictionary in app.py
and run it.