Commit
·
9307bce
1
Parent(s):
52333b0
add optional flash-attn
Browse filesalso added default no to delete model
- auto-exl2-upload/INSTRUCTIONS.txt +5 -5
- auto-exl2-upload/exl2-quant.py +5 -1
- auto-exl2-upload/linux-setup.sh +30 -6
- auto-exl2-upload/windows-setup.bat +24 -3
- exl2-multi-quant-local/INSTRUCTIONS.txt +6 -6
- exl2-multi-quant-local/exl2-quant.py +5 -1
- exl2-multi-quant-local/linux-setup.sh +29 -5
- exl2-multi-quant-local/windows-setup.bat +24 -2
auto-exl2-upload/INSTRUCTIONS.txt
CHANGED
@@ -8,7 +8,7 @@ https://developer.nvidia.com/cuda-11-8-0-download-archive
|
|
8 |
|
9 |
Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
|
10 |
|
11 |
-
|
12 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
13 |
install the desktop development for C++ workload
|
14 |
|
@@ -19,11 +19,11 @@ For example, on Ubuntu use: sudo apt-get install build-essential
|
|
19 |
|
20 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
21 |
|
22 |
-
Only python 3.8 - 3.
|
23 |
|
24 |
|
25 |
|
26 |
-
First setup your environment by using either windows.bat or linux.sh.
|
27 |
|
28 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
29 |
|
@@ -32,7 +32,7 @@ Make sure to also have a lot of RAM depending on the model. Have noticed gemma t
|
|
32 |
|
33 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
34 |
|
35 |
-
To add more options to the quantization process, you can add them to line
|
36 |
|
37 |
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
38 |
|
@@ -46,4 +46,4 @@ https://github.com/oobabooga
|
|
46 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
47 |
https://github.com/Wauplin
|
48 |
|
49 |
-
Only tested with CUDA 12.1 on Windows 11
|
|
|
8 |
|
9 |
Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
|
10 |
|
11 |
+
Visual Studio with desktop development for C++ is required.
|
12 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
13 |
install the desktop development for C++ workload
|
14 |
|
|
|
19 |
|
20 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
21 |
|
22 |
+
Only python 3.8 - 3.12 is known to work. If you have a higher/lower version of python, I can't guarantee that it will work.
|
23 |
|
24 |
|
25 |
|
26 |
+
First setup your environment by using either windows.bat or linux.sh.
|
27 |
|
28 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
29 |
|
|
|
32 |
|
33 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
34 |
|
35 |
+
To add more options to the quantization process, you can add them to line 189. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
|
36 |
|
37 |
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
38 |
|
|
|
46 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
47 |
https://github.com/Wauplin
|
48 |
|
49 |
+
Only tested with CUDA 12.1 on Windows 11 and WSL2 Ubuntu 24.04
|
auto-exl2-upload/exl2-quant.py
CHANGED
@@ -114,9 +114,13 @@ while priv2pub != 'y' and priv2pub != 'n':
|
|
114 |
clear_screen()
|
115 |
|
116 |
#ask to delete original fp16 weights
|
117 |
-
delmodel = input("Do you want to delete the original model? (Won't delete if paused or failed) (y/
|
|
|
|
|
118 |
while delmodel != 'y' and delmodel != 'n':
|
119 |
delmodel = input("Please enter 'y' or 'n': ")
|
|
|
|
|
120 |
clear_screen()
|
121 |
|
122 |
#downloading the model
|
|
|
114 |
clear_screen()
|
115 |
|
116 |
#ask to delete original fp16 weights
|
117 |
+
delmodel = input("Do you want to delete the original model? (Won't delete if paused or failed) (y/N): ")
|
118 |
+
if delmodel == '':
|
119 |
+
delmodel = 'n'
|
120 |
while delmodel != 'y' and delmodel != 'n':
|
121 |
delmodel = input("Please enter 'y' or 'n': ")
|
122 |
+
if delmodel == '':
|
123 |
+
delmodel = 'n'
|
124 |
clear_screen()
|
125 |
|
126 |
#downloading the model
|
auto-exl2-upload/linux-setup.sh
CHANGED
@@ -4,11 +4,15 @@
|
|
4 |
|
5 |
# check if "venv" subdirectory exists, if not, create one
|
6 |
if [ ! -d "venv" ]; then
|
7 |
-
|
8 |
else
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
12 |
fi
|
13 |
|
14 |
# ask if the user has git installed
|
@@ -17,7 +21,9 @@ read -p "Do you have git and wget installed? (y/n) " gitwget
|
|
17 |
if [ "$gitwget" = "y" ]; then
|
18 |
echo "Setting up environment"
|
19 |
else
|
20 |
-
echo "Please install git and wget before running this script."
|
|
|
|
|
21 |
read -p "Press enter to continue"
|
22 |
exit
|
23 |
fi
|
@@ -33,6 +39,15 @@ fi
|
|
33 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
34 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
if [ "$pytorch_version" = "11" ]; then
|
37 |
echo "Installing PyTorch for CUDA 11.8"
|
38 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
@@ -54,6 +69,7 @@ rm download-model.py
|
|
54 |
rm -rf exllamav2
|
55 |
rm start-quant.sh
|
56 |
rm enter-venv.sh
|
|
|
57 |
|
58 |
# download stuff
|
59 |
echo "Downloading files"
|
@@ -71,6 +87,14 @@ venv/bin/python -m pip install -r exllamav2/requirements.txt
|
|
71 |
venv/bin/python -m pip install huggingface-hub transformers accelerate
|
72 |
venv/bin/python -m pip install ./exllamav2
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
# create start-quant.sh
|
75 |
echo "#!/bin/bash" > start-quant.sh
|
76 |
echo "venv/bin/python exl2-quant.py" >> start-quant.sh
|
@@ -86,4 +110,4 @@ chmod +x enter-venv.sh
|
|
86 |
echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
|
87 |
echo "Environment setup complete. run start-quant.sh to start the quantization process."
|
88 |
read -p "Press enter to exit"
|
89 |
-
exit
|
|
|
4 |
|
5 |
# check if "venv" subdirectory exists, if not, create one
|
6 |
if [ ! -d "venv" ]; then
|
7 |
+
python -m venv venv
|
8 |
else
|
9 |
+
read -p "venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) " reinst
|
10 |
+
if [ "$reinst" = "y" ]; then
|
11 |
+
rm -rf venv
|
12 |
+
python -m venv venv
|
13 |
+
else
|
14 |
+
exit
|
15 |
+
fi
|
16 |
fi
|
17 |
|
18 |
# ask if the user has git installed
|
|
|
21 |
if [ "$gitwget" = "y" ]; then
|
22 |
echo "Setting up environment"
|
23 |
else
|
24 |
+
echo "Please install git and wget from your distro's package manager before running this script."
|
25 |
+
echo "Example for Debian-based: sudo apt-get install git wget"
|
26 |
+
echo "Example for Arch-based: sudo pacman -S git wget"
|
27 |
read -p "Press enter to continue"
|
28 |
exit
|
29 |
fi
|
|
|
39 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
40 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
41 |
|
42 |
+
# ask to install flash attention
|
43 |
+
echo "Flash attention is a feature that could fix overflow issues on some more broken models."
|
44 |
+
read -p "Would you like to install flash-attention? (rarely needed and optional) (y/n) " flash_attention
|
45 |
+
if [ "$flash_attention" != "y" ] && [ "$flash_attention" != "n" ]; then
|
46 |
+
echo "Invalid input. Please enter y or n."
|
47 |
+
read -p "Press enter to continue"
|
48 |
+
exit
|
49 |
+
fi
|
50 |
+
|
51 |
if [ "$pytorch_version" = "11" ]; then
|
52 |
echo "Installing PyTorch for CUDA 11.8"
|
53 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
|
|
69 |
rm -rf exllamav2
|
70 |
rm start-quant.sh
|
71 |
rm enter-venv.sh
|
72 |
+
rm -rf flash-attention
|
73 |
|
74 |
# download stuff
|
75 |
echo "Downloading files"
|
|
|
87 |
venv/bin/python -m pip install huggingface-hub transformers accelerate
|
88 |
venv/bin/python -m pip install ./exllamav2
|
89 |
|
90 |
+
if [ "$flash_attention" = "y" ]; then
|
91 |
+
echo "Installing flash-attention..."
|
92 |
+
echo "If failed, retry without flash-attention."
|
93 |
+
git clone https://github.com/Dao-AILab/flash-attention
|
94 |
+
venv/bin/python -m pip install ./flash-attention
|
95 |
+
rm -rf flash-attention
|
96 |
+
fi
|
97 |
+
|
98 |
# create start-quant.sh
|
99 |
echo "#!/bin/bash" > start-quant.sh
|
100 |
echo "venv/bin/python exl2-quant.py" >> start-quant.sh
|
|
|
110 |
echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
|
111 |
echo "Environment setup complete. run start-quant.sh to start the quantization process."
|
112 |
read -p "Press enter to exit"
|
113 |
+
exit
|
auto-exl2-upload/windows-setup.bat
CHANGED
@@ -6,8 +6,12 @@ REM check if "venv" subdirectory exists, if not, create one
|
|
6 |
if not exist "venv\" (
|
7 |
python -m venv venv
|
8 |
) else (
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
11 |
exit
|
12 |
)
|
13 |
|
@@ -36,6 +40,15 @@ echo CUDA compilers:
|
|
36 |
where nvcc
|
37 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
if "%cuda_version%"=="11" (
|
40 |
echo Installing PyTorch for CUDA 11.8...
|
41 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
@@ -48,13 +61,13 @@ if "%cuda_version%"=="11" (
|
|
48 |
exit
|
49 |
)
|
50 |
|
51 |
-
|
52 |
echo Deleting potential conflicting files
|
53 |
del convert-to-safetensors.py
|
54 |
del download-model.py
|
55 |
rmdir /s /q exllamav2
|
56 |
del start-quant.sh
|
57 |
del enter-venv.sh
|
|
|
58 |
|
59 |
REM download stuff
|
60 |
echo Downloading files...
|
@@ -72,6 +85,14 @@ venv\scripts\python.exe -m pip install -r exllamav2/requirements.txt
|
|
72 |
venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
|
73 |
venv\scripts\python.exe -m pip install .\exllamav2
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
REM create start-quant-windows.bat
|
76 |
echo @echo off > start-quant.bat
|
77 |
echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
|
|
|
6 |
if not exist "venv\" (
|
7 |
python -m venv venv
|
8 |
) else (
|
9 |
+
set /p reinst="venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) "
|
10 |
+
)
|
11 |
+
if "%reinst%"=="y" (
|
12 |
+
rmdir /s /q venv
|
13 |
+
python -m venv venv
|
14 |
+
) else (
|
15 |
exit
|
16 |
)
|
17 |
|
|
|
40 |
where nvcc
|
41 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
42 |
|
43 |
+
REM ask to install flash attention
|
44 |
+
echo Flash attention is a feature that could fix overflow issues on some more broken models. However it will increase install time by a few hours.
|
45 |
+
set /p flash_attention="Would you like to install flash-attention? (rarely needed and optional) (y/n) "
|
46 |
+
if not "%flash_attention%"=="y" if not "%flash_attention%"=="n" (
|
47 |
+
echo Invalid input. Please enter y or n.
|
48 |
+
pause
|
49 |
+
exit
|
50 |
+
)
|
51 |
+
|
52 |
if "%cuda_version%"=="11" (
|
53 |
echo Installing PyTorch for CUDA 11.8...
|
54 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
|
|
61 |
exit
|
62 |
)
|
63 |
|
|
|
64 |
echo Deleting potential conflicting files
|
65 |
del convert-to-safetensors.py
|
66 |
del download-model.py
|
67 |
rmdir /s /q exllamav2
|
68 |
del start-quant.sh
|
69 |
del enter-venv.sh
|
70 |
+
rmdir /s /q flash-attention
|
71 |
|
72 |
REM download stuff
|
73 |
echo Downloading files...
|
|
|
85 |
venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
|
86 |
venv\scripts\python.exe -m pip install .\exllamav2
|
87 |
|
88 |
+
if "%flash_attention%"=="y" (
|
89 |
+
echo Installing flash-attention. Go watch some movies, this will take a while...
|
90 |
+
echo If failed, retry without flash-attention.
|
91 |
+
git clone https://github.com/Dao-AILab/flash-attention
|
92 |
+
venv\scripts\python.exe -m pip install .\flash-attention
|
93 |
+
rmdir /s /q flash-attention
|
94 |
+
)
|
95 |
+
|
96 |
REM create start-quant-windows.bat
|
97 |
echo @echo off > start-quant.bat
|
98 |
echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
|
exl2-multi-quant-local/INSTRUCTIONS.txt
CHANGED
@@ -1,14 +1,14 @@
|
|
1 |
For NVIDIA cards install the CUDA toolkit
|
2 |
|
3 |
Nvidia Maxwell or higher
|
4 |
-
https://developer.nvidia.com/cuda-
|
5 |
|
6 |
Nvidia Kepler or higher
|
7 |
https://developer.nvidia.com/cuda-11-8-0-download-archive
|
8 |
|
9 |
Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
|
10 |
|
11 |
-
|
12 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
13 |
install the desktop development for C++ workload
|
14 |
|
@@ -19,11 +19,11 @@ For example, on Ubuntu use: sudo apt-get install build-essential
|
|
19 |
|
20 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
21 |
|
22 |
-
Only python 3.8 - 3.
|
23 |
|
24 |
|
25 |
|
26 |
-
First setup your environment by using either windows.bat or linux.sh.
|
27 |
|
28 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
29 |
|
@@ -32,7 +32,7 @@ Make sure to also have a lot of RAM depending on the model. Have noticed gemma t
|
|
32 |
|
33 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
34 |
|
35 |
-
To add more options to the quantization process, you can add them to line
|
36 |
|
37 |
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
38 |
|
@@ -46,4 +46,4 @@ https://github.com/oobabooga
|
|
46 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
47 |
https://github.com/Wauplin
|
48 |
|
49 |
-
Only tested with CUDA 12.1 on Windows 11
|
|
|
1 |
For NVIDIA cards install the CUDA toolkit
|
2 |
|
3 |
Nvidia Maxwell or higher
|
4 |
+
https://developer.nvidia.com/cuda-12-1-0-download-archive
|
5 |
|
6 |
Nvidia Kepler or higher
|
7 |
https://developer.nvidia.com/cuda-11-8-0-download-archive
|
8 |
|
9 |
Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
|
10 |
|
11 |
+
Visual Studio with desktop development for C++ is required.
|
12 |
https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
|
13 |
install the desktop development for C++ workload
|
14 |
|
|
|
19 |
|
20 |
This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
|
21 |
|
22 |
+
Only python 3.8 - 3.12 is known to work. If you have a higher/lower version of python, I can't guarantee that it will work.
|
23 |
|
24 |
|
25 |
|
26 |
+
First setup your environment by using either windows.bat or linux.sh.
|
27 |
|
28 |
After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
|
29 |
|
|
|
32 |
|
33 |
If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
|
34 |
|
35 |
+
To add more options to the quantization process, you can add them to line 140. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
|
36 |
|
37 |
Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
|
38 |
|
|
|
46 |
Credit to Lucain Pouget for maintaining huggingface-hub.
|
47 |
https://github.com/Wauplin
|
48 |
|
49 |
+
Only tested with CUDA 12.1 on Windows 11 and WSL2 Ubuntu 24.04
|
exl2-multi-quant-local/exl2-quant.py
CHANGED
@@ -85,9 +85,13 @@ bpwvalue = list(qnum.values())
|
|
85 |
bpwvalue.sort()
|
86 |
|
87 |
#ask to delete fp16 after done
|
88 |
-
delmodel = input("Do you want to delete the original model
|
|
|
|
|
89 |
while delmodel != 'y' and delmodel != 'n':
|
90 |
delmodel = input("Please enter 'y' or 'n': ")
|
|
|
|
|
91 |
if delmodel == 'y':
|
92 |
print(f"Deleting dir models/{model} after quants are finished.")
|
93 |
time.sleep(3)
|
|
|
85 |
bpwvalue.sort()
|
86 |
|
87 |
#ask to delete fp16 after done
|
88 |
+
delmodel = input("Do you want to delete the original model? (Won't delete if paused or failed) (y/N): ")
|
89 |
+
if delmodel == '':
|
90 |
+
delmodel = 'n'
|
91 |
while delmodel != 'y' and delmodel != 'n':
|
92 |
delmodel = input("Please enter 'y' or 'n': ")
|
93 |
+
if delmodel == '':
|
94 |
+
delmodel = 'n'
|
95 |
if delmodel == 'y':
|
96 |
print(f"Deleting dir models/{model} after quants are finished.")
|
97 |
time.sleep(3)
|
exl2-multi-quant-local/linux-setup.sh
CHANGED
@@ -4,11 +4,15 @@
|
|
4 |
|
5 |
# check if "venv" subdirectory exists, if not, create one
|
6 |
if [ ! -d "venv" ]; then
|
7 |
-
|
8 |
else
|
9 |
-
|
10 |
-
|
11 |
-
|
|
|
|
|
|
|
|
|
12 |
fi
|
13 |
|
14 |
# ask if the user has git installed
|
@@ -17,7 +21,9 @@ read -p "Do you have git and wget installed? (y/n) " gitwget
|
|
17 |
if [ "$gitwget" = "y" ]; then
|
18 |
echo "Setting up environment"
|
19 |
else
|
20 |
-
echo "Please install git and wget before running this script."
|
|
|
|
|
21 |
read -p "Press enter to continue"
|
22 |
exit
|
23 |
fi
|
@@ -33,6 +39,15 @@ fi
|
|
33 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
34 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
if [ "$pytorch_version" = "11" ]; then
|
37 |
echo "Installing PyTorch for CUDA 11.8"
|
38 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
@@ -54,6 +69,7 @@ rm download-model.py
|
|
54 |
rm -rf exllamav2
|
55 |
rm start-quant.sh
|
56 |
rm enter-venv.sh
|
|
|
57 |
|
58 |
# download stuff
|
59 |
echo "Downloading files"
|
@@ -71,6 +87,14 @@ venv/bin/python -m pip install -r exllamav2/requirements.txt
|
|
71 |
venv/bin/python -m pip install huggingface-hub transformers accelerate
|
72 |
venv/bin/python -m pip install ./exllamav2
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
# create start-quant.sh
|
75 |
echo "#!/bin/bash" > start-quant.sh
|
76 |
echo "venv/bin/python exl2-quant.py" >> start-quant.sh
|
|
|
4 |
|
5 |
# check if "venv" subdirectory exists, if not, create one
|
6 |
if [ ! -d "venv" ]; then
|
7 |
+
python -m venv venv
|
8 |
else
|
9 |
+
read -p "venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) " reinst
|
10 |
+
if [ "$reinst" = "y" ]; then
|
11 |
+
rm -rf venv
|
12 |
+
python -m venv venv
|
13 |
+
else
|
14 |
+
exit
|
15 |
+
fi
|
16 |
fi
|
17 |
|
18 |
# ask if the user has git installed
|
|
|
21 |
if [ "$gitwget" = "y" ]; then
|
22 |
echo "Setting up environment"
|
23 |
else
|
24 |
+
echo "Please install git and wget from your distro's package manager before running this script."
|
25 |
+
echo "Example for Debian-based: sudo apt-get install git wget"
|
26 |
+
echo "Example for Arch-based: sudo pacman -S git wget"
|
27 |
read -p "Press enter to continue"
|
28 |
exit
|
29 |
fi
|
|
|
39 |
# if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
|
40 |
read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
|
41 |
|
42 |
+
# ask to install flash attention
|
43 |
+
echo "Flash attention is a feature that could fix overflow issues on some more broken models."
|
44 |
+
read -p "Would you like to install flash-attention? (rarely needed and optional) (y/n) " flash_attention
|
45 |
+
if [ "$flash_attention" != "y" ] && [ "$flash_attention" != "n" ]; then
|
46 |
+
echo "Invalid input. Please enter y or n."
|
47 |
+
read -p "Press enter to continue"
|
48 |
+
exit
|
49 |
+
fi
|
50 |
+
|
51 |
if [ "$pytorch_version" = "11" ]; then
|
52 |
echo "Installing PyTorch for CUDA 11.8"
|
53 |
venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
|
|
69 |
rm -rf exllamav2
|
70 |
rm start-quant.sh
|
71 |
rm enter-venv.sh
|
72 |
+
rm -rf flash-attention
|
73 |
|
74 |
# download stuff
|
75 |
echo "Downloading files"
|
|
|
87 |
venv/bin/python -m pip install huggingface-hub transformers accelerate
|
88 |
venv/bin/python -m pip install ./exllamav2
|
89 |
|
90 |
+
if [ "$flash_attention" = "y" ]; then
|
91 |
+
echo "Installing flash-attention..."
|
92 |
+
echo "If failed, retry without flash-attention."
|
93 |
+
git clone https://github.com/Dao-AILab/flash-attention
|
94 |
+
venv/bin/python -m pip install ./flash-attention
|
95 |
+
rm -rf flash-attention
|
96 |
+
fi
|
97 |
+
|
98 |
# create start-quant.sh
|
99 |
echo "#!/bin/bash" > start-quant.sh
|
100 |
echo "venv/bin/python exl2-quant.py" >> start-quant.sh
|
exl2-multi-quant-local/windows-setup.bat
CHANGED
@@ -6,8 +6,12 @@ REM check if "venv" subdirectory exists, if not, create one
|
|
6 |
if not exist "venv\" (
|
7 |
python -m venv venv
|
8 |
) else (
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
11 |
exit
|
12 |
)
|
13 |
|
@@ -36,6 +40,15 @@ echo CUDA compilers:
|
|
36 |
where nvcc
|
37 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
if "%cuda_version%"=="11" (
|
40 |
echo Installing PyTorch for CUDA 11.8...
|
41 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
@@ -54,6 +67,7 @@ del download-model.py
|
|
54 |
rmdir /s /q exllamav2
|
55 |
del start-quant.sh
|
56 |
del enter-venv.sh
|
|
|
57 |
|
58 |
REM download stuff
|
59 |
echo Downloading files...
|
@@ -71,6 +85,14 @@ venv\scripts\python.exe -m pip install -r exllamav2/requirements.txt
|
|
71 |
venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
|
72 |
venv\scripts\python.exe -m pip install .\exllamav2
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
REM create start-quant-windows.bat
|
75 |
echo @echo off > start-quant.bat
|
76 |
echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
|
|
|
6 |
if not exist "venv\" (
|
7 |
python -m venv venv
|
8 |
) else (
|
9 |
+
set /p reinst="venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) "
|
10 |
+
)
|
11 |
+
if "%reinst%"=="y" (
|
12 |
+
rmdir /s /q venv
|
13 |
+
python -m venv venv
|
14 |
+
) else (
|
15 |
exit
|
16 |
)
|
17 |
|
|
|
40 |
where nvcc
|
41 |
set /p cuda_version="Please enter your CUDA version (11 or 12): "
|
42 |
|
43 |
+
REM ask to install flash attention
|
44 |
+
echo Flash attention is a feature that could fix overflow issues on some more broken models. However it will increase install time by a few hours.
|
45 |
+
set /p flash_attention="Would you like to install flash-attention? (rarely needed and optional) (y/n) "
|
46 |
+
if not "%flash_attention%"=="y" if not "%flash_attention%"=="n" (
|
47 |
+
echo Invalid input. Please enter y or n.
|
48 |
+
pause
|
49 |
+
exit
|
50 |
+
)
|
51 |
+
|
52 |
if "%cuda_version%"=="11" (
|
53 |
echo Installing PyTorch for CUDA 11.8...
|
54 |
venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
|
|
|
67 |
rmdir /s /q exllamav2
|
68 |
del start-quant.sh
|
69 |
del enter-venv.sh
|
70 |
+
rmdir /s /q flash-attention
|
71 |
|
72 |
REM download stuff
|
73 |
echo Downloading files...
|
|
|
85 |
venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
|
86 |
venv\scripts\python.exe -m pip install .\exllamav2
|
87 |
|
88 |
+
if "%flash_attention%"=="y" (
|
89 |
+
echo Installing flash-attention. Go watch some movies, this will take a while...
|
90 |
+
echo If failed, retry without flash-attention.
|
91 |
+
git clone https://github.com/Dao-AILab/flash-attention
|
92 |
+
venv\scripts\python.exe -m pip install .\flash-attention
|
93 |
+
rmdir /s /q flash-attention
|
94 |
+
)
|
95 |
+
|
96 |
REM create start-quant-windows.bat
|
97 |
echo @echo off > start-quant.bat
|
98 |
echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
|