Spaces:

Qwen
/

QwQ-32B-preview

Running

App Files Files Community

Hardware Requirements

by Lightchain - opened Nov 28, 2024

Discussion

Lightchain

Nov 28, 2024

Very interesting model. Does anyone have info on what hardware is required to run it?

alpindale

Qwen org Nov 28, 2024

You will need ~80GB of memory for inference at 16bit. Half that for 8bit, and a quarter that for 4bit.

richwardle

Nov 28, 2024

I just ran 16bit on an A100 SXM w/ 80GB of vram

sszymczyk

Nov 28, 2024

•

edited Nov 28, 2024

With llama.cpp this model with Q4_K_M quantization and 15000 context size fits on a single RTX 3090 or 4090 (24GB VRAM). Its performance doesn't seem to be affected much - at least based on my limited testing on a set of 50 reasoning puzzles.

Ainonake

Nov 28, 2024

You can also run it on cpu if you have 32gb ram.

gptahmed1

Dec 1, 2024

•

edited Dec 1, 2024

juliproo

Dec 7, 2024

@Ainonake How can i set it up on my local cpu? (Answers from others are also welcome)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment