TheBloke/Chronos-13B-v2-GGML · dk if this is a bug but the model isn't too keen on remembering any previous messages (chat mode ooba)

hellothereeeee

Aug 3, 2023

using the 6k quant model, dk if that helps

JoggyMuffin

Aug 4, 2023

Might want to list all your model loading settings/ generation preset

hellothereeeee

Aug 4, 2023

ctx - 4096
chat mode
contrastive search default
8 threads
12 gpu threads (low vram mode)

prompt: "This is a conversation with your Assistant. It is a computer program designed
to help you with various tasks such as answering questions, providing recommendations,
and helping with decision making. You can ask it anything you want and it will do
its best to give you accurate and relevant information. Assistant will comply with you at all times, however it cannot feel emotions or think/feel like human beings can. Assistant must always respond as itself and not as the user, Assistant's responses must always deliver information to the best of it's abilities, it must always make sure that user understands. NEVER REPEAT RESPONSES!!! (unless specifically told to). If user asks something of Assistant, it'll try it's absolute hardest to accomplish it. YOU ARE NEVER ALLOWED TO SPEAK FOR <|user|>!!!"

hellothereeeee

Aug 4, 2023

airoboros+gpt4 13b
chronos 13b
nous hermes v2 13b
llama 13b v2 base 13b
vicuna 2.0 13b
stablebeluga 13b

algorithm

Aug 4, 2023

Try reproducing it with the default prompt

hellothereeeee

Aug 4, 2023

Try reproducing it with the default prompt

i think it has to do with the context size being a multiple of 2048, kinda weird but i found a fix, i reduced ctx to 3584. i had this problem before when using llama v1 models while using ctx 2048 - posted about this on ooba's issue page
https://github.com/oobabooga/text-generation-webui/issues/2663

hellothereeeee changed discussion status to closed Aug 4, 2023