extreme slowdown and weird output.

#4
by abhimortal6 - opened

Tried in oobabooga web_ui, not usable in my case
3060ti 8GB VRAM, 24GB RAM

Output is weird it never returns the code.

Output generated in 27.06 seconds (0.30 tokens/s, 8 tokens, context 63, seed 1191894163)
Output generated in 66.12 seconds (0.47 tokens/s, 31 tokens, context 80, seed 1706855517)
Output generated in 386.01 seconds (0.04 tokens/s, 16 tokens, context 131, seed 1791131008)
Output generated in 50.16 seconds (0.48 tokens/s, 24 tokens, context 118, seed 1161001351)
Output generated in 23.89 seconds (0.04 tokens/s, 1 tokens, context 150, seed 1752912455)
Output generated in 202.32 seconds (0.05 tokens/s, 10 tokens, context 169, seed 1726966570)

Screenshot 2023-06-05 014426.png

This is a quantitized 15b model. Also, how did you get it to run?

This is a quantitized 15b model. Also, how did you get it to run?

Sure title says so, quality is decreased marginally though. To run in webui use configs from ->
https://huggingface.co/ShipItMind/starcoder-gptq-4bit-128g

hey, just pushed some new fixes.
Can you give those a try?

This comment has been hidden

Sorry but where are the updated files? current repo showing last updated month ago

the fixes are in the repo: https://github.com/mayank31398/GPTQ-for-SantaCoder
The weights are same.

OOM
3060ti 8GB VRAM, 24GB RAM

 python -m santacoder_inference bigcode/starcoder --wbits 4 --groupsize 128 --load starcoder-GPTQ-4bit-128g/model.pt
Traceback (most recent call last):
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/abhi/Documents/starcoder/GPTQ-for-SantaCoder/santacoder_inference.py", line 96, in <module>
    main()
  File "/home/abhi/Documents/starcoder/GPTQ-for-SantaCoder/santacoder_inference.py", line 86, in main
    model = get_santacoder(args.model, args.load, args.wbits, args.groupsize)
  File "/home/abhi/Documents/starcoder/GPTQ-for-SantaCoder/santacoder_inference.py", line 49, in get_santacoder
    model = model.cuda()
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/nn/modules/module.py", line 844, in _apply
    self._buffers[key] = fn(buf)
  File "/home/abhi/miniconda3/envs/gptq/lib/python3.9/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB (GPU 0; 7.78 GiB total capacity; 6.68 GiB already allocated; 75.31 MiB free; 6.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

yeah, its not supposed to work with 3060ti.

alright, closing.

abhimortal6 changed discussion status to closed

Sign up or log in to comment