metadata

license: apache-2.0
datasets:
  - hudsongouge/Open-Discord
language:
  - en
pipeline_tag: text-generation
tags:
  - casual
  - chat

Chatter-70M

This repository contains the Chatter-70M model, a lightweight casual chat language model available in multiple formats:

GGUF format (quantized and FP16 versions)
SafeTensors format

Model Variants

model-Q4_K_M.gguf: 4-bit quantized version
model-Q6_K.gguf: 6-bit quantized version
model-Q8_0.gguf: 8-bit quantized version
model-FP16.gguf: 16-bit floating point version
model.safetensors: Original model weights in SafeTensors format

Configuration Files

The repository includes several configuration files:

config.json: Model configuration
generation_config.json: Generation parameters configuration
tokenizer_config.json: Tokenizer configuration
special_tokens_map.json: Special tokens mapping
tokenizer.json & tokenizer.model: Tokenizer files
training_args.bin: Training arguments

Training

This model was trained 1 epoch on the Open Discord Dataset and a few other small non-public datasets for a total of 705.8mb of uncompressed text data from Discord. It trained for approximately 14 hours on my laptop (M3 Max, 30-core GPU) with a peak learning rate of 4e-4 with cosine decay and 100 warmup steps. I used a batch size of 16 and 8 gradient accumulation steps. It used an AdamW optimizer with betas 0.900 and 0.990. It was trained exlusively on samples 1024 tokens in length.

Architecture

It uses a Llama-3 architecture:

LlamaConfig(
  hidden_size=512,
  intermediate_size=1024,
  num_attention_heads=8,
  num_hidden_layers=16,
  num_key_value_heads=4,
  rms_norm_eps=1e-5,
  vocab_size=32000,
  max_position_embeddings=4096,
  torch_dtype=bfloat16,
  hidden_dropout_prob=0.2,
  attention_probs_dropout_prob=0.2,
  attention_bias=True,
)

It uses the Llama-2 7B tokenizer.

Chat Format

TBH, I didn't expect this model to do so well. So, the chat format is a little unconventional.

[username] at [timestamp]:
content
[username] at [timestamp]:
content

Example:

Hudson at 2025-03-07T20:47:54.374-05:00:
hi

Bot at 2025-03-07T20:48:17.635-05:00:
hi hudson

Hudson at 2025-03-07T20:48:17.635-05:00:
how are you?

Bot at 2025-03-07T20:48:27.635-05:00:
i am good at the moment

Recommended usage

Chatter 70M is coherent on its own, but it is very stupid. Because it was trained on discord data, it is also very sensitive to usernames. Using different usernames can produce different styles. For example, the username "Bot" will make it respond in an AI-assistant-like style. More casual discord usernames will cause it to respond more casually. I recommend playing around with this to find the best user and assistant usernames for your purpose. Also, I recommend using a low temperature (probably about 0.5) for coherent output. Here is a sample:

jameui at 2019-05-13T18:46:02.66-04:00:
@Deleted User the gov is a thing
themaka at 2019-05-13T18:46:09.257-04:00:
You could've got it here if you want to check and then you can't find the image.
Deleted User at 2019-05-13T18:46:21.716-04:00:
lmao
Deleted User at 2019-05-13T18:46:37.228-04:00:
they're not in the server
Deleted User at 2019-05-13T18:46:44.609-04:00:
wait what
jameui at 2019-05-13T18:46:47.762-04:00:
Lol
jameui at 2019-05-13T18:47:08.378-04:00:
I just said it was a joke
themaka at 2019-05-13T18:47:26.856-04:00:
You can, @Deleted User @Deleted User if you've got a job on this chat and then get the last one I will be banned from my server, not 1013
jameui at 2019-05-13T18:47:31.673-04:00:
lol
jameui at 2019-05-13T18:47:35.129-04:00:
lol
jameui at 2019-05-13T18:47:37.202-04:00:
oof
Deleted User at 2019-05-13T18:47:46.574-04:00:
wow
jameui at 2019-05-13T18:47:55.176-04:00:
I don't think so
themaka at 2019-05-13T18:47:56.823-04:00:
I'm not sure about that.

As you can see, it looks like a discord chat. However, it isn't very smart.

Because of that, I recommend averaging probabilities across this model and Llama2-7B. That produces much better results while still maintaining the casual language:

Hudson at 2025-03-09T08:02:57.857+05:00:
hi

NeverUsedDC at 2025-03-09T08:03:00.670+05:00:
hello

Hudson at 2025-03-09T08:03:03.128+05:00:
how are you?

NeverUsedDC at 2025-03-09T08:03:06.081+05:00:
good

Hudson at 2025-03-09T08:03:13.063+05:00:
Working on anything cool lately?

NeverUsedDC at 2025-03-09T08:03:16.373+05:00:
yeah, just finished a project for a client

Hudson at 2025-03-09T08:03:22.529+05:00:
what was the project?

NeverUsedDC at 2025-03-09T08:03:25.950+05:00:
it was a website for a new startup

Hudson at 2025-03-09T08:03:37.603+05:00:
oh, that's cool! What startup?

NeverUsedDC at 2025-03-09T08:04:47.741+05:00:
the startup is called "EcoClean" and they're focused on sustainable cleaning products

Hudson at 2025-03-09T08:05:09.089+05:00:
Cool! What's your favorite programming language btw?

NeverUsedDC at 2025-03-09T08:05:36.843+05:00:
i'm more of a backend dev, so i'd say python is my go-to language

Hudson at 2025-03-09T08:05:42.235+05:00:
How do you feel about Rust?

NeverUsedDC at 2025-03-09T08:05:46.480+05:00:
haven't used it much, but i've heard good things

Hudson at 2025-03-09T08:09:11.898+05:00:
btw, where do you live?

NeverUsedDC at 2025-03-09T08:11:04.133+05:00:
lol

Hudson at 2025-03-09T08:11:36.795+05:00:
So, you won't tell me?

NeverUsedDC at 2025-03-09T08:11:41.110+05:00:
Haha, nope

Hudson at 2025-03-09T08:12:19.944+05:00:
Okay, fine. I've been working on an AI project lately.

NeverUsedDC at 2025-03-09T08:12:24.337+05:00:
oh

Hudson at 2025-03-09T08:19:08.059+05:00:
what do you think of it?

NeverUsedDC at 2025-03-09T08:19:08.059+05:00:
it's cool

See? Best of both worlds. You can adjust the balance between them to get different results. Chatter 70M on its own generally prefers simple responses like "lol", "oh", "idk", and things like that. It can and will generate grammatically correct sentences but most of the time they have little real meaing or irrelevent. That is why Llama is helpful. Llama 2 is going to craft longer responses and responds more like an AI.

One thing you will notice is that Chatter has is often responding with incomplete thoughts, perhaps just a single word. This is because on Discord thoughts are often sent as multiple messages. I am working to add support on my app to properly represent multi-message responses.

Running

For best results, use my LLM Probability Visualizer Tool which provides native support for this model, its chat format, username customization, probability averaginb, and more. Just download this model's GGUF, and clone that repo, then put in this model and Llama2 7B in the MODELS dictionary in app.py and run it.