How to reproduce?

by robbie0 - opened Aug 23, 2024

Aug 23, 2024

Hi again, I'm interested in experimenting with the VNTL method; I'm fully convinced that causal LMs are the way to go for feed-forward contextual translation. Unfortunately, 27B is a bit much for my 12G 4070, and the 8B model (at Q8_0) keeps returning blank lines, so I want to try experimenting with training more QLoRAs. Would you be able to provide the code you are currently using as a starting point?

lmg-anon

Owner Aug 23, 2024

•

edited Aug 23, 2024

That’s odd... the 8B model shouldn’t return blank lines. How are you using it?

In any case, I’m more than happy to share my code! It’s great to see more people interested in pushing this forward.
You can find the code I used for the 8B model here: https://gist.github.com/lmg-anon/f616e9406587f633295fa48d41d4ecb5

The code for the 27B model is pretty much the same, without the hardcoded jp_token, en_token, human_token and llm_token variables and some other details, but I can't find it right now.

robbie0

Aug 23, 2024

•

edited Aug 23, 2024

You can find the code I used for the 8B model here:

Thank you so much! Will experiment with this over the weekend.

That’s odd... the 8B model shouldn’t return blank lines. How are you using it?

I'm using the Q8_0 quant through llama.cpp's llama-cli and llama-server binaries, running on CUDA with full offloading. I have a program that's feeding the Japanese iteratively into the model with a sliding window approach that attempts to maximize context usage. Metadata is characters only and is determined by filtering a hardcoded database using all of the speakers currently in context as well as the next speaker and any names that are present in the current line (in theory, it should determine it from previous lines too, but I got lazy). When a translation is returned from the model, excess whitespace is trimmed before adding it to the context.

Here's the problematic prompt: https://gist.github.com/robbie01/790e2cf98e927f13a24b676cccd88f00

Note well that the prompt should end with a space, but not a newline. If sampled with temperature 0, it only yields a single additional whitespace. This is also true for fp16.

lmg-anon

Owner Aug 24, 2024

•

edited Aug 24, 2024

The issue with your prompt seems to be the space at the end, which causes the model to be unable to write ' 「', which is a single token:

robbie0

Aug 24, 2024

Wow, I’ve been doing it wrong this whole time. Now the reason why I have to trim in the first place makes sense; with the space at the end of the prompt, it always output 「 at the beginning.

If you don’t mind, what program are you using to view that?

lmg-anon

Owner Aug 24, 2024

That's mikupad: https://github.com/lmg-anon/mikupad, with the "Monospace Dark" theme.

robbie0

Aug 24, 2024

Thanks! By the way, I adjusted my approach in light of this: the prompt includes everything up until the <<ENGLISH>>\n, as that is all guaranteed to tokenize in a specific way, and the [Speaker]: is forced through a GBNF grammar like root ::= "[Speaker]: " [^\n]+ to let the model output tokens as naturally as possible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment