Is not stopping after End Of answer, hallucinates the whole conversation

#2
by underlines - opened

Reproduction

I am loading it with the latest text-generation-webui from 2 hours ago, with the revamped --chat that supports MODE: instruction.
In this mode you can select an INSTRUCTION TEMPLATE in the UI, for which I created a yaml file in \characters\instruction-following\ according to the example file Alpaca.yaml:

name: "### Response:"
your_name: "### Instruction:"
context: "Below is an instruction that describes a task. Write a response that appropriately completes the request."

for which I derived the one for Vicuna.yaml

name: "### Assistant:"
your_name: "### Human:"
context: "Below is an instruction that describes a task. Write a response that appropriately completes the request."

And started with

python server.py --model elinas_vicuna-13b-4bit --w
bits 4 --groupsize 128 --chat --listen

Switched to Mode: Instruction and selected the Vicuna template.

Issue

Asking it to follow instructions, it answers not only as the Assistant, but hallucinates further Human instructions and Assistant replies.

Questions

  • Is this a problem with EOS token, like this commit?
  • Or is this a text-generation-webui issue?
  • Or a problem with the VIcuna.yaml I created?

Example:

image.png

It is an EOS issue and I am currently on hold on quantizing more models until there is better standardization in GPTQ. The original code author is pushing triton, which would lock out Windows users, unless you use WSL + having to maintain a CUDA + triton version. It's simply becoming tiring with all of these breaking changes and having to re-quantize, revert to previous commits, which should not be happening.

As for the new chat mode, I haven't tried it and have been taking a break from LLMs to work on other projects. In the "default" mode, you could just lower the token limit and hit continue for more info, or do the opposite and hit stop.

Until Vicuna unfiltered comes out or something more interesting, this will be my last contribution until everything is more stable.

@underlines : Once this PR https://github.com/oobabooga/text-generation-webui/pull/903 lands, you should be able to use Vicuna without it talking to itself. It's been a game changer for me (I just cloned the fork)

Sign up or log in to comment