RuntimeError: expected scalar type Float but found Half

by Jakxx - opened Apr 17, 2023

Apr 17, 2023

•

edited Apr 17, 2023

The model seems to load fine, but trying to generate text with it just throws me "RuntimeError: expected scalar type Float but found Half"

Any idea what this could be? I loaded the model as bfloat16.

Edit: my bad. I loaded the model as OPT instead of GPTJ into Oogabooga. >_<

djstraylight

Apr 17, 2023

Has anybody gotten this model to load in oogabooga? I'm also getting the "RuntimeError: expected scalar type Float but found Half"

Using --wbits 4 --groupsize 128 (no model_type given)

I've seen other people use model_type as gptj but it performs slow as hell.

I'm having no problem loading MetaIX/GPT4-X-Alpaca-30B-Int4/gpt4-x-alpaca-30b-128g-4bit.safetensors with just --wbits 4 --groupsize 128.

Jakxx

Apr 17, 2023

I think that is just what you have to deal with. 30 Billion parameters are a LOT of data. I am running this on a 4090 and I get around 0.4 tokens/s, while a 13B model gives me more like 9 tokens/s.

djstraylight

Apr 17, 2023

Jakxx.. the only problem is that my other 30 Billion parameter model performs ok on a 3090:

Found the following quantized model: models/GPT4-X-Alpaca-30B-Int4/gpt4-x-alpaca-30b-128g-4bit.safetensors
Loading model ...
Done.
Loaded the model in 18.39 seconds.
Output generated in 9.74 seconds (6.26 tokens/s, 61 tokens, context 218, seed 3259645)

Try MetaIX/GPT4-X-Alpaca-30B-Int4 on your 4090.

djstraylight

Apr 17, 2023

I believe this model was quantized with a funny branch of GPTQ-for-LLaMa. Most people are using the standard GPTQ-for-LLaMa with text-generation-webui. Hopefully, we'll see another version of this model quantized.

Jakxx

Apr 18, 2023

Aah well here's to hoping.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment