altomek
/

New-Dawn-Llama-3-70B-32K-v1.0-4bpw-EXL2

Text Generation

Not-for-all-Audiences

text-generation-inference

4-bit precision

Model card Files Files and versions Community

AssertionError: Total sequence length exceeds cache size in model.forward

#1

by Hardcore7651 - opened Jun 29, 2024

Jun 29, 2024

I'm getting this error when running past 2k context despite having the modeled loaded for 32k on runpod on an a6000.

I belive it is related to this: https://github.com/oobabooga/text-generation-webui/issues/5750#issuecomment-2024442282

But I am not knowledgable enough to be sure.

altomek

Owner Jun 29, 2024

•

edited Jun 29, 2024

I use text-generation-webui from May 19 and do not have this issue. I use 4bit cache. What are your settings and what version do you use?

BTW, I made a small update in config.json and tokenizer_config.json - I believe it is unrelated to your problem, but please update those files.

Jun 29, 2024

Max length is at 32k. Alpha value is at 1, compress_pos_emb at 1. I have tried both 8 and 4 bit cache and neither worked. I can get successful generations up to about 2k then it will simply fail. Also on textgen webui.

This is my pod template: text-generation-webui-oneclick-UI-and-API
ID: vmg0ubbuwtesbw

altomek

Owner Jun 30, 2024

Maybe you need to update ExLlama or textgen webui? I have no idea how to help you.

altomek changed discussion status to closed Jun 30, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment