Vram usage

by Juuuuu - opened Apr 4, 2023

Discussion

Juuuuu

Apr 4, 2023

Can you guys tell me the vram usage of this model. I a 3080ti laptop with 8gb.
Thanks

TheYuriLover

Apr 4, 2023

8-9 gb of vram is required

ghogan42

Apr 4, 2023

I see 8.7-8.9 used on my 16GB laptop 3080 with the model loaded in oogabooga. It goes up to 12.2 when it's actually generating text.

Yuuru

Apr 5, 2023

8gb cards load it only with 50% layers offload to CPU

TheGoldenSmith

Apr 7, 2023

@Yuuru How can I try this?

bash99

Apr 7, 2023

@ghogan42 12.2, I have a desktop 3060 which only has 12GB, can I offload a few layers to CPU so I can run it?

cyx123

Apr 9, 2023

curious wonder 12g 3060 able to run this model or not

TheYuriLover

Apr 9, 2023

@cyx123 I have a 12g 3060 and it has no problem running every 13b models as they fluctuate between 9 and 11 gb of vram usage.

dliedke

Apr 9, 2023

I got RuntimeError: CUDA error: out of memory with NVIDIA 3070 8GB

dliedke

Apr 9, 2023

I could make it work with 8GB VRAM, slow:
Output generated in 19.02 seconds (0.63 tokens/s, 12 tokens, context 132)
Output generated in 47.36 seconds (1.03 tokens/s, 49 tokens, context 230)
Output generated in 28.04 seconds (0.96 tokens/s, 27 tokens, context 363)

from https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/discussions/14
edit start-webui.bat and replace all the text with:

@echo off

@echo Starting the web UI...

cd /D "%~dp0"

set MAMBA_ROOT_PREFIX=%cd%\installer_files\mamba
set INSTALL_ENV_DIR=%cd%\installer_files\env

if not exist "%MAMBA_ROOT_PREFIX%\condabin\micromamba.bat" (
call "%MAMBA_ROOT_PREFIX%\micromamba.exe" shell hook >nul 2>&1
)
call "%MAMBA_ROOT_PREFIX%\condabin\micromamba.bat" activate "%INSTALL_ENV_DIR%" || ( echo MicroMamba hook not found. && goto end )
cd text-generation-webui

call python server.py --auto-devices --chat --threads 8 --wbits 4 --groupsize 128 --pre_layer 30

:end
pause

cxfcxf

Apr 17, 2023

yeah i m also getting 1 token per second with splitting on 8GB VRAM, the performance is bad, i was able to achieve the same using ggml model + llama.cpp withDRAM

FixYourSelf

Apr 19, 2023

is it possible to somehow run it on 6gb vram? i have a laptop with 3060rtx
so far getting CUDA out of memory message

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment