Commit
·
ee99e8c
1
Parent(s):
ec9b8d8
upgrade to new default pytorch version and instructions
Browse files- exllamav2 scripts/auto-exl2-upload/INSTRUCTIONS.txt +17 -12
- exllamav2 scripts/auto-exl2-upload/auto-exl2-upload.zip +2 -2
- exllamav2 scripts/auto-exl2-upload/linux-setup.sh +1 -19
- exllamav2 scripts/auto-exl2-upload/windows-setup.bat +4 -20
- exllamav2 scripts/exl2-multi-quant-local/INSTRUCTIONS.txt +19 -14
- exllamav2 scripts/exl2-multi-quant-local/exl2-multi-quant-local.zip +2 -2
- exllamav2 scripts/exl2-multi-quant-local/linux-setup.sh +1 -19
- exllamav2 scripts/exl2-multi-quant-local/windows-setup.bat +4 -20
exllamav2 scripts/auto-exl2-upload/INSTRUCTIONS.txt
CHANGED
@@ -1,20 +1,26 @@
|
|
1 |
For NVIDIA cards install the CUDA toolkit
|
2 |
|
3 |
-
|
4 |
-
https://developer.nvidia.com/cuda-12-
|
5 |
|
6 |
-
|
7 |
https://developer.nvidia.com/cuda-11-8-0-download-archive
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
Visual Studio with desktop development for C++ is required.
|
12 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
13 |
install the desktop development for C++ workload
|
14 |
|
|
|
|
|
15 |
Make sure to have git and wget installed on your system.
|
16 |
|
17 |
-
For Linux, install the build tools from your package manager.
|
18 |
For example, on Ubuntu use: sudo apt-get install build-essential
|
19 |
|
20 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
@@ -28,13 +34,14 @@ First setup your environment by using either windows.bat or linux.sh. If you wan
|
|
28 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
29 |
|
30 |
Make sure that your storage space is 3x the amount of the model's size. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
|
31 |
-
Make sure to also have a lot of RAM depending on the model. Have noticed gemma to use a lot.
|
32 |
|
33 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
34 |
|
35 |
-
To add more options to the quantization process, you can add them to line
|
36 |
|
37 |
-
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
|
|
38 |
|
39 |
|
40 |
Credit to turboderp for creating exllamav2 and the exl2 quantization method.
|
@@ -44,6 +51,4 @@ Credit to oobabooga the original download and safetensors scripts.
|
|
44 |
https://github.com/oobabooga
|
45 |
|
46 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
47 |
-
https://github.com/Wauplin
|
48 |
-
|
49 |
-
Only tested with CUDA 12.1 on Windows 11 and WSL2 Ubuntu 24.04
|
|
|
1 |
For NVIDIA cards install the CUDA toolkit
|
2 |
|
3 |
+
Recommended for all exllama features:
|
4 |
+
https://developer.nvidia.com/cuda-12-4-0-download-archive
|
5 |
|
6 |
+
Only for compatibily, doesn't support all features:
|
7 |
https://developer.nvidia.com/cuda-11-8-0-download-archive
|
8 |
|
9 |
+
Paths should overall lead to the version of cuda you're going to install with
|
10 |
+
Paths examples with 12.4:
|
11 |
+
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
|
12 |
+
CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
|
13 |
+
CUDA_PATH_V12_4: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
|
14 |
|
15 |
+
(FOR WINDOWS ONLY) Visual Studio with desktop development for C++ is required. 2019 works fine for me.
|
16 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
17 |
install the desktop development for C++ workload
|
18 |
|
19 |
+
you might need cl.exe in path, for me it looks like this: "c:\program files (x86)\microsoft visual studio\2019\community\vc\tools\msvc\14.29.30133\bin\hostx64\x64"
|
20 |
+
|
21 |
Make sure to have git and wget installed on your system.
|
22 |
|
23 |
+
For Linux, you need gcc so you'll need to install the build tools from your package manager.
|
24 |
For example, on Ubuntu use: sudo apt-get install build-essential
|
25 |
|
26 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
|
|
34 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
35 |
|
36 |
Make sure that your storage space is 3x the amount of the model's size. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
|
37 |
+
Make sure to also have a lot of RAM depending on the model. Have noticed gemma to use a lot. If the quantizing crashes without a cuda out of memory error then it most likely ran out of system memory.
|
38 |
|
39 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
40 |
|
41 |
+
To add more options to the quantization process, you can add them to line 206. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
|
42 |
|
43 |
+
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, try redownloading the latest version, and if that doesn't work then please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
44 |
+
for faster/easier support, join the exllama discord server and go to community-projects where you can find "easy exl2 quantizing" https://discord.gg/NSFwVuCjRq
|
45 |
|
46 |
|
47 |
Credit to turboderp for creating exllamav2 and the exl2 quantization method.
|
|
|
51 |
https://github.com/oobabooga
|
52 |
|
53 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
54 |
+
https://github.com/Wauplin
|
|
|
|
exllamav2 scripts/auto-exl2-upload/auto-exl2-upload.zip
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0282faa2e9e5f59ad353505dcac6c7fa581faead8da3e5d94584e636a5f8f647
|
3 |
+
size 8497
|
exllamav2 scripts/auto-exl2-upload/linux-setup.sh
CHANGED
@@ -39,20 +39,11 @@ fi
|
|
39 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
40 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
41 |
|
42 |
-
# ask to install flash attention
|
43 |
-
echo "Flash attention is a feature that could fix overflow issues on some more broken models, however, it will increase install time by a few hours."
|
44 |
-
read -p "Would you like to install flash-attention? (rarely needed and optional) (y/n) " flash_attention
|
45 |
-
if [ "$flash_attention" != "y" ] && [ "$flash_attention" != "n" ]; then
|
46 |
-
echo "Invalid input. Please enter y or n."
|
47 |
-
read -p "Press enter to continue"
|
48 |
-
exit
|
49 |
-
fi
|
50 |
-
|
51 |
if [ "$pytorch_version" = "11" ]; then
|
52 |
echo "Installing PyTorch for CUDA 11.8"
|
53 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
54 |
elif [ "$pytorch_version" = "12" ]; then
|
55 |
-
echo "Installing PyTorch for CUDA 12.
|
56 |
venv/bin/python -m pip install torch
|
57 |
elif [ "$pytorch_version" = "rocm" ]; then
|
58 |
echo "Installing PyTorch for AMD ROCm 5.7"
|
@@ -100,15 +91,6 @@ echo "#!/bin/bash" > enter-venv.sh
|
|
100 |
echo "bash --init-file venv/bin/activate" >> enter-venv.sh
|
101 |
chmod +x enter-venv.sh
|
102 |
|
103 |
-
if [ "$flash_attention" = "y" ]; then
|
104 |
-
echo "Going to attempt to install flash attention but it isn't required."
|
105 |
-
echo "You may close now if you'd like and continue without flash attention."
|
106 |
-
read -p "Press enter to continue and install flash attention"
|
107 |
-
echo "Get some popcorn and watch a movie, this will take a while."
|
108 |
-
echo "Installing flash-attn..."
|
109 |
-
venv/bin/python -m pip install git+https://github.com/Dao-AILab/flash-attention.git
|
110 |
-
fi
|
111 |
-
|
112 |
echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
|
113 |
echo "Environment setup complete. run start-quant.sh to start the quantization process."
|
114 |
read -p "Press enter to exit"
|
|
|
39 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
40 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
if [ "$pytorch_version" = "11" ]; then
|
43 |
echo "Installing PyTorch for CUDA 11.8"
|
44 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
45 |
elif [ "$pytorch_version" = "12" ]; then
|
46 |
+
echo "Installing PyTorch for CUDA 12.4"
|
47 |
venv/bin/python -m pip install torch
|
48 |
elif [ "$pytorch_version" = "rocm" ]; then
|
49 |
echo "Installing PyTorch for AMD ROCm 5.7"
|
|
|
91 |
echo "bash --init-file venv/bin/activate" >> enter-venv.sh
|
92 |
chmod +x enter-venv.sh
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
|
95 |
echo "Environment setup complete. run start-quant.sh to start the quantization process."
|
96 |
read -p "Press enter to exit"
|
exllamav2 scripts/auto-exl2-upload/windows-setup.bat
CHANGED
@@ -40,23 +40,16 @@ if not "%exllamav2_version%"=="stable" if not "%exllamav2_version%"=="dev" (
|
|
40 |
REM if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8
|
41 |
echo CUDA compilers:
|
42 |
where nvcc
|
|
|
|
|
43 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
44 |
|
45 |
-
REM ask to install flash attention
|
46 |
-
echo Flash attention is a feature that could fix overflow issues on some more broken models, however, it will increase install time by a few hours.
|
47 |
-
set /p flash_attention="Would you like to install flash-attention? (rarely needed and optional) (y/n) "
|
48 |
-
if not "%flash_attention%"=="y" if not "%flash_attention%"=="n" (
|
49 |
-
echo Invalid input. Please enter y or n.
|
50 |
-
pause
|
51 |
-
exit
|
52 |
-
)
|
53 |
-
|
54 |
if "%cuda_version%"=="11" (
|
55 |
echo Installing PyTorch for CUDA 11.8...
|
56 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
57 |
) else if "%cuda_version%"=="12" (
|
58 |
-
echo Installing PyTorch for CUDA 12.
|
59 |
-
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/
|
60 |
) else (
|
61 |
echo Invalid CUDA version. Please enter 11 or 12.
|
62 |
pause
|
@@ -99,15 +92,6 @@ REM create enter-venv.bat
|
|
99 |
echo @echo off > enter-venv.bat
|
100 |
echo cmd /k call venv\scripts\activate.bat >> enter-venv.bat
|
101 |
|
102 |
-
if "%flash_attention%"=="y" (
|
103 |
-
echo Going to attempt to install flash attention but it isn't required.
|
104 |
-
echo You may close now if you'd like and continue without flash attention.
|
105 |
-
pause
|
106 |
-
echo Get some popcorn and watch a movie. This will take a while.
|
107 |
-
echo Installing flash-attn...
|
108 |
-
venv\scripts\python.exe -m pip install git+https://github.com/Dao-AILab/flash-attention.git
|
109 |
-
)
|
110 |
-
|
111 |
powershell -c (New-Object Media.SoundPlayer "C:\Windows\Media\tada.wav").PlaySync();
|
112 |
echo Environment setup complete. run start-quant.bat to start the quantization process.
|
113 |
pause
|
|
|
40 |
REM if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8
|
41 |
echo CUDA compilers:
|
42 |
where nvcc
|
43 |
+
echo CUDA_HOME:
|
44 |
+
echo %CUDA_HOME%
|
45 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
if "%cuda_version%"=="11" (
|
48 |
echo Installing PyTorch for CUDA 11.8...
|
49 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
50 |
) else if "%cuda_version%"=="12" (
|
51 |
+
echo Installing PyTorch for CUDA 12.4...
|
52 |
+
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu124 --upgrade
|
53 |
) else (
|
54 |
echo Invalid CUDA version. Please enter 11 or 12.
|
55 |
pause
|
|
|
92 |
echo @echo off > enter-venv.bat
|
93 |
echo cmd /k call venv\scripts\activate.bat >> enter-venv.bat
|
94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
powershell -c (New-Object Media.SoundPlayer "C:\Windows\Media\tada.wav").PlaySync();
|
96 |
echo Environment setup complete. run start-quant.bat to start the quantization process.
|
97 |
pause
|
exllamav2 scripts/exl2-multi-quant-local/INSTRUCTIONS.txt
CHANGED
@@ -1,20 +1,26 @@
|
|
1 |
For NVIDIA cards install the CUDA toolkit
|
2 |
|
3 |
-
|
4 |
-
https://developer.nvidia.com/cuda-12-
|
5 |
|
6 |
-
|
7 |
https://developer.nvidia.com/cuda-11-8-0-download-archive
|
8 |
|
9 |
-
|
|
|
|
|
|
|
|
|
10 |
|
11 |
-
Visual Studio with desktop development for C++ is required.
|
12 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
13 |
install the desktop development for C++ workload
|
14 |
|
15 |
-
|
16 |
|
17 |
-
|
|
|
|
|
18 |
For example, on Ubuntu use: sudo apt-get install build-essential
|
19 |
|
20 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
@@ -27,14 +33,15 @@ First setup your environment by using either windows.bat or linux.sh. If you wan
|
|
27 |
|
28 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
29 |
|
30 |
-
Make sure that your storage space is 3x the amount of the model's size
|
31 |
-
Make sure to also have a lot of RAM depending on the model. Have noticed gemma to use a lot.
|
32 |
|
33 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
34 |
|
35 |
-
To add more options to the quantization process, you can add them to line
|
36 |
|
37 |
-
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
|
|
38 |
|
39 |
|
40 |
Credit to turboderp for creating exllamav2 and the exl2 quantization method.
|
@@ -44,6 +51,4 @@ Credit to oobabooga the original download and safetensors scripts.
|
|
44 |
https://github.com/oobabooga
|
45 |
|
46 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
47 |
-
https://github.com/Wauplin
|
48 |
-
|
49 |
-
Only tested with CUDA 12.1 on Windows 11 and WSL2 Ubuntu 24.04
|
|
|
1 |
For NVIDIA cards install the CUDA toolkit
|
2 |
|
3 |
+
Recommended for all exllama features:
|
4 |
+
https://developer.nvidia.com/cuda-12-4-0-download-archive
|
5 |
|
6 |
+
Only for compatibily, doesn't support all features:
|
7 |
https://developer.nvidia.com/cuda-11-8-0-download-archive
|
8 |
|
9 |
+
Paths should overall lead to the version of cuda you're going to install with
|
10 |
+
Paths examples with 12.4:
|
11 |
+
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
|
12 |
+
CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
|
13 |
+
CUDA_PATH_V12_4: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
|
14 |
|
15 |
+
(FOR WINDOWS ONLY) Visual Studio with desktop development for C++ is required. 2019 works fine for me.
|
16 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
17 |
install the desktop development for C++ workload
|
18 |
|
19 |
+
you might need cl.exe in path, for me it looks like this: "c:\program files (x86)\microsoft visual studio\2019\community\vc\tools\msvc\14.29.30133\bin\hostx64\x64"
|
20 |
|
21 |
+
Make sure to have git and wget installed on your system.
|
22 |
+
|
23 |
+
For Linux, you need gcc so you'll need to install the build tools from your package manager.
|
24 |
For example, on Ubuntu use: sudo apt-get install build-essential
|
25 |
|
26 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
|
|
33 |
|
34 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
35 |
|
36 |
+
Make sure that your storage space is 3x the amount of the model's size. To mesure this, take the number of billion parameters and mutliply by two, afterwards mutliply by 3 and that's the recommended storage. There's a chance you may get away with 2.5x the size as well.
|
37 |
+
Make sure to also have a lot of RAM depending on the model. Have noticed gemma to use a lot. If the quantizing crashes without a cuda out of memory error then it most likely ran out of system memory.
|
38 |
|
39 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
40 |
|
41 |
+
To add more options to the quantization process, you can add them to line 153. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
|
42 |
|
43 |
+
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, try redownloading the latest version, and if that doesn't work then please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
44 |
+
for faster/easier support, join the exllama discord server and go to community-projects where you can find "easy exl2 quantizing" https://discord.gg/NSFwVuCjRq
|
45 |
|
46 |
|
47 |
Credit to turboderp for creating exllamav2 and the exl2 quantization method.
|
|
|
51 |
https://github.com/oobabooga
|
52 |
|
53 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
54 |
+
https://github.com/Wauplin
|
|
|
|
exllamav2 scripts/exl2-multi-quant-local/exl2-multi-quant-local.zip
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e5c9eef354fe8bc983870772b2fbfa47af7b0dc5d8e8dbb5b10fae8f5e9ba8cb
|
3 |
+
size 7280
|
exllamav2 scripts/exl2-multi-quant-local/linux-setup.sh
CHANGED
@@ -39,20 +39,11 @@ fi
|
|
39 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
40 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
41 |
|
42 |
-
# ask to install flash attention
|
43 |
-
echo "Flash attention is a feature that could fix overflow issues on some more broken models, however, it will increase install time by a few hours."
|
44 |
-
read -p "Would you like to install flash-attention? (rarely needed and optional) (y/n) " flash_attention
|
45 |
-
if [ "$flash_attention" != "y" ] && [ "$flash_attention" != "n" ]; then
|
46 |
-
echo "Invalid input. Please enter y or n."
|
47 |
-
read -p "Press enter to continue"
|
48 |
-
exit
|
49 |
-
fi
|
50 |
-
|
51 |
if [ "$pytorch_version" = "11" ]; then
|
52 |
echo "Installing PyTorch for CUDA 11.8"
|
53 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
54 |
elif [ "$pytorch_version" = "12" ]; then
|
55 |
-
echo "Installing PyTorch for CUDA 12.
|
56 |
venv/bin/python -m pip install torch
|
57 |
elif [ "$pytorch_version" = "rocm" ]; then
|
58 |
echo "Installing PyTorch for AMD ROCm 5.7"
|
@@ -100,15 +91,6 @@ echo "#!/bin/bash" > enter-venv.sh
|
|
100 |
echo "bash --init-file venv/bin/activate" >> enter-venv.sh
|
101 |
chmod +x enter-venv.sh
|
102 |
|
103 |
-
if [ "$flash_attention" = "y" ]; then
|
104 |
-
echo "Going to attempt to install flash attention but it isn't required."
|
105 |
-
echo "You may close now if you'd like and continue without flash attention."
|
106 |
-
read -p "Press enter to continue and install flash attention"
|
107 |
-
echo "Get some popcorn and watch a movie, this will take a while."
|
108 |
-
echo "Installing flash-attn..."
|
109 |
-
venv/bin/python -m pip install git+https://github.com/Dao-AILab/flash-attention.git
|
110 |
-
fi
|
111 |
-
|
112 |
echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
|
113 |
echo "Environment setup complete. run start-quant.sh to start the quantization process."
|
114 |
read -p "Press enter to exit"
|
|
|
39 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
40 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
if [ "$pytorch_version" = "11" ]; then
|
43 |
echo "Installing PyTorch for CUDA 11.8"
|
44 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
45 |
elif [ "$pytorch_version" = "12" ]; then
|
46 |
+
echo "Installing PyTorch for CUDA 12.4"
|
47 |
venv/bin/python -m pip install torch
|
48 |
elif [ "$pytorch_version" = "rocm" ]; then
|
49 |
echo "Installing PyTorch for AMD ROCm 5.7"
|
|
|
91 |
echo "bash --init-file venv/bin/activate" >> enter-venv.sh
|
92 |
chmod +x enter-venv.sh
|
93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
|
95 |
echo "Environment setup complete. run start-quant.sh to start the quantization process."
|
96 |
read -p "Press enter to exit"
|
exllamav2 scripts/exl2-multi-quant-local/windows-setup.bat
CHANGED
@@ -40,23 +40,16 @@ if not "%exllamav2_version%"=="stable" if not "%exllamav2_version%"=="dev" (
|
|
40 |
REM if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8
|
41 |
echo CUDA compilers:
|
42 |
where nvcc
|
|
|
|
|
43 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
44 |
|
45 |
-
REM ask to install flash attention
|
46 |
-
echo Flash attention is a feature that could fix overflow issues on some more broken models, however, it will increase install time by a few hours.
|
47 |
-
set /p flash_attention="Would you like to install flash-attention? (rarely needed and optional) (y/n) "
|
48 |
-
if not "%flash_attention%"=="y" if not "%flash_attention%"=="n" (
|
49 |
-
echo Invalid input. Please enter y or n.
|
50 |
-
pause
|
51 |
-
exit
|
52 |
-
)
|
53 |
-
|
54 |
if "%cuda_version%"=="11" (
|
55 |
echo Installing PyTorch for CUDA 11.8...
|
56 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
57 |
) else if "%cuda_version%"=="12" (
|
58 |
-
echo Installing PyTorch for CUDA 12.
|
59 |
-
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/
|
60 |
) else (
|
61 |
echo Invalid CUDA version. Please enter 11 or 12.
|
62 |
pause
|
@@ -99,15 +92,6 @@ REM create enter-venv.bat
|
|
99 |
echo @echo off > enter-venv.bat
|
100 |
echo cmd /k call venv\scripts\activate.bat >> enter-venv.bat
|
101 |
|
102 |
-
if "%flash_attention%"=="y" (
|
103 |
-
echo Going to attempt to install flash attention but it isn't required.
|
104 |
-
echo You may close now if you'd like and continue without flash attention.
|
105 |
-
pause
|
106 |
-
echo Get some popcorn and watch a movie. This will take a while.
|
107 |
-
echo Installing flash-attn...
|
108 |
-
venv\scripts\python.exe -m pip install git+https://github.com/Dao-AILab/flash-attention.git
|
109 |
-
)
|
110 |
-
|
111 |
powershell -c (New-Object Media.SoundPlayer "C:\Windows\Media\tada.wav").PlaySync();
|
112 |
echo Environment setup complete. run start-quant.bat to start the quantization process.
|
113 |
pause
|
|
|
40 |
REM if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8
|
41 |
echo CUDA compilers:
|
42 |
where nvcc
|
43 |
+
echo CUDA_HOME:
|
44 |
+
echo %CUDA_HOME%
|
45 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
if "%cuda_version%"=="11" (
|
48 |
echo Installing PyTorch for CUDA 11.8...
|
49 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
50 |
) else if "%cuda_version%"=="12" (
|
51 |
+
echo Installing PyTorch for CUDA 12.4...
|
52 |
+
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu124 --upgrade
|
53 |
) else (
|
54 |
echo Invalid CUDA version. Please enter 11 or 12.
|
55 |
pause
|
|
|
92 |
echo @echo off > enter-venv.bat
|
93 |
echo cmd /k call venv\scripts\activate.bat >> enter-venv.bat
|
94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
powershell -c (New-Object Media.SoundPlayer "C:\Windows\Media\tada.wav").PlaySync();
|
96 |
echo Environment setup complete. run start-quant.bat to start the quantization process.
|
97 |
pause
|