Anthonyg5005 commited on
Commit
9307bce
·
1 Parent(s): 52333b0

add optional flash-attn

Browse files

also added default no to delete model

auto-exl2-upload/INSTRUCTIONS.txt CHANGED
@@ -8,7 +8,7 @@ https://developer.nvidia.com/cuda-11-8-0-download-archive
8
 
9
  Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
10
 
11
- Haven't done much testing but for Windows, Visual Studio 2019 with desktop development for C++ might be required.
12
  https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
13
  install the desktop development for C++ workload
14
 
@@ -19,11 +19,11 @@ For example, on Ubuntu use: sudo apt-get install build-essential
19
 
20
  This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
21
 
22
- Only python 3.8 - 3.11 is known to work. If you have a higher version of python, I can't guarantee that it will work.
23
 
24
 
25
 
26
- First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then delete venv folder and try again.
27
 
28
  After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
29
 
@@ -32,7 +32,7 @@ Make sure to also have a lot of RAM depending on the model. Have noticed gemma t
32
 
33
  If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
34
 
35
- To add more options to the quantization process, you can add them to line 174. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
36
 
37
  Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
38
 
@@ -46,4 +46,4 @@ https://github.com/oobabooga
46
  Credit to Lucain Pouget for maintaining huggingface-hub.
47
  https://github.com/Wauplin
48
 
49
- Only tested with CUDA 12.1 on Windows 11
 
8
 
9
  Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
10
 
11
+ Visual Studio with desktop development for C++ is required.
12
  https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
13
  install the desktop development for C++ workload
14
 
 
19
 
20
  This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
21
 
22
+ Only python 3.8 - 3.12 is known to work. If you have a higher/lower version of python, I can't guarantee that it will work.
23
 
24
 
25
 
26
+ First setup your environment by using either windows.bat or linux.sh.
27
 
28
  After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
29
 
 
32
 
33
  If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
34
 
35
+ To add more options to the quantization process, you can add them to line 189. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
36
 
37
  Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
38
 
 
46
  Credit to Lucain Pouget for maintaining huggingface-hub.
47
  https://github.com/Wauplin
48
 
49
+ Only tested with CUDA 12.1 on Windows 11 and WSL2 Ubuntu 24.04
auto-exl2-upload/exl2-quant.py CHANGED
@@ -114,9 +114,13 @@ while priv2pub != 'y' and priv2pub != 'n':
114
  clear_screen()
115
 
116
  #ask to delete original fp16 weights
117
- delmodel = input("Do you want to delete the original model? (Won't delete if paused or failed) (y/n): ")
 
 
118
  while delmodel != 'y' and delmodel != 'n':
119
  delmodel = input("Please enter 'y' or 'n': ")
 
 
120
  clear_screen()
121
 
122
  #downloading the model
 
114
  clear_screen()
115
 
116
  #ask to delete original fp16 weights
117
+ delmodel = input("Do you want to delete the original model? (Won't delete if paused or failed) (y/N): ")
118
+ if delmodel == '':
119
+ delmodel = 'n'
120
  while delmodel != 'y' and delmodel != 'n':
121
  delmodel = input("Please enter 'y' or 'n': ")
122
+ if delmodel == '':
123
+ delmodel = 'n'
124
  clear_screen()
125
 
126
  #downloading the model
auto-exl2-upload/linux-setup.sh CHANGED
@@ -4,11 +4,15 @@
4
 
5
  # check if "venv" subdirectory exists, if not, create one
6
  if [ ! -d "venv" ]; then
7
- python3 -m venv venv
8
  else
9
- echo "venv directory already exists. If something is broken, delete venv folder and run this script again."
10
- read -p "Press enter to continue"
11
- exit
 
 
 
 
12
  fi
13
 
14
  # ask if the user has git installed
@@ -17,7 +21,9 @@ read -p "Do you have git and wget installed? (y/n) " gitwget
17
  if [ "$gitwget" = "y" ]; then
18
  echo "Setting up environment"
19
  else
20
- echo "Please install git and wget before running this script."
 
 
21
  read -p "Press enter to continue"
22
  exit
23
  fi
@@ -33,6 +39,15 @@ fi
33
  # if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
34
  read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
35
 
 
 
 
 
 
 
 
 
 
36
  if [ "$pytorch_version" = "11" ]; then
37
  echo "Installing PyTorch for CUDA 11.8"
38
  venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
@@ -54,6 +69,7 @@ rm download-model.py
54
  rm -rf exllamav2
55
  rm start-quant.sh
56
  rm enter-venv.sh
 
57
 
58
  # download stuff
59
  echo "Downloading files"
@@ -71,6 +87,14 @@ venv/bin/python -m pip install -r exllamav2/requirements.txt
71
  venv/bin/python -m pip install huggingface-hub transformers accelerate
72
  venv/bin/python -m pip install ./exllamav2
73
 
 
 
 
 
 
 
 
 
74
  # create start-quant.sh
75
  echo "#!/bin/bash" > start-quant.sh
76
  echo "venv/bin/python exl2-quant.py" >> start-quant.sh
@@ -86,4 +110,4 @@ chmod +x enter-venv.sh
86
  echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
87
  echo "Environment setup complete. run start-quant.sh to start the quantization process."
88
  read -p "Press enter to exit"
89
- exit
 
4
 
5
  # check if "venv" subdirectory exists, if not, create one
6
  if [ ! -d "venv" ]; then
7
+ python -m venv venv
8
  else
9
+ read -p "venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) " reinst
10
+ if [ "$reinst" = "y" ]; then
11
+ rm -rf venv
12
+ python -m venv venv
13
+ else
14
+ exit
15
+ fi
16
  fi
17
 
18
  # ask if the user has git installed
 
21
  if [ "$gitwget" = "y" ]; then
22
  echo "Setting up environment"
23
  else
24
+ echo "Please install git and wget from your distro's package manager before running this script."
25
+ echo "Example for Debian-based: sudo apt-get install git wget"
26
+ echo "Example for Arch-based: sudo pacman -S git wget"
27
  read -p "Press enter to continue"
28
  exit
29
  fi
 
39
  # if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
40
  read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
41
 
42
+ # ask to install flash attention
43
+ echo "Flash attention is a feature that could fix overflow issues on some more broken models."
44
+ read -p "Would you like to install flash-attention? (rarely needed and optional) (y/n) " flash_attention
45
+ if [ "$flash_attention" != "y" ] && [ "$flash_attention" != "n" ]; then
46
+ echo "Invalid input. Please enter y or n."
47
+ read -p "Press enter to continue"
48
+ exit
49
+ fi
50
+
51
  if [ "$pytorch_version" = "11" ]; then
52
  echo "Installing PyTorch for CUDA 11.8"
53
  venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
 
69
  rm -rf exllamav2
70
  rm start-quant.sh
71
  rm enter-venv.sh
72
+ rm -rf flash-attention
73
 
74
  # download stuff
75
  echo "Downloading files"
 
87
  venv/bin/python -m pip install huggingface-hub transformers accelerate
88
  venv/bin/python -m pip install ./exllamav2
89
 
90
+ if [ "$flash_attention" = "y" ]; then
91
+ echo "Installing flash-attention..."
92
+ echo "If failed, retry without flash-attention."
93
+ git clone https://github.com/Dao-AILab/flash-attention
94
+ venv/bin/python -m pip install ./flash-attention
95
+ rm -rf flash-attention
96
+ fi
97
+
98
  # create start-quant.sh
99
  echo "#!/bin/bash" > start-quant.sh
100
  echo "venv/bin/python exl2-quant.py" >> start-quant.sh
 
110
  echo "If you use ctrl+c to stop, you may need to also use 'pkill python' to stop running scripts."
111
  echo "Environment setup complete. run start-quant.sh to start the quantization process."
112
  read -p "Press enter to exit"
113
+ exit
auto-exl2-upload/windows-setup.bat CHANGED
@@ -6,8 +6,12 @@ REM check if "venv" subdirectory exists, if not, create one
6
  if not exist "venv\" (
7
  python -m venv venv
8
  ) else (
9
- echo venv directory already exists. If something is broken, delete everything but exl2-quant.py and run this script again.
10
- pause
 
 
 
 
11
  exit
12
  )
13
 
@@ -36,6 +40,15 @@ echo CUDA compilers:
36
  where nvcc
37
  set /p cuda_version="Please enter your CUDA version (11 or 12): "
38
 
 
 
 
 
 
 
 
 
 
39
  if "%cuda_version%"=="11" (
40
  echo Installing PyTorch for CUDA 11.8...
41
  venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
@@ -48,13 +61,13 @@ if "%cuda_version%"=="11" (
48
  exit
49
  )
50
 
51
-
52
  echo Deleting potential conflicting files
53
  del convert-to-safetensors.py
54
  del download-model.py
55
  rmdir /s /q exllamav2
56
  del start-quant.sh
57
  del enter-venv.sh
 
58
 
59
  REM download stuff
60
  echo Downloading files...
@@ -72,6 +85,14 @@ venv\scripts\python.exe -m pip install -r exllamav2/requirements.txt
72
  venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
73
  venv\scripts\python.exe -m pip install .\exllamav2
74
 
 
 
 
 
 
 
 
 
75
  REM create start-quant-windows.bat
76
  echo @echo off > start-quant.bat
77
  echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
 
6
  if not exist "venv\" (
7
  python -m venv venv
8
  ) else (
9
+ set /p reinst="venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) "
10
+ )
11
+ if "%reinst%"=="y" (
12
+ rmdir /s /q venv
13
+ python -m venv venv
14
+ ) else (
15
  exit
16
  )
17
 
 
40
  where nvcc
41
  set /p cuda_version="Please enter your CUDA version (11 or 12): "
42
 
43
+ REM ask to install flash attention
44
+ echo Flash attention is a feature that could fix overflow issues on some more broken models. However it will increase install time by a few hours.
45
+ set /p flash_attention="Would you like to install flash-attention? (rarely needed and optional) (y/n) "
46
+ if not "%flash_attention%"=="y" if not "%flash_attention%"=="n" (
47
+ echo Invalid input. Please enter y or n.
48
+ pause
49
+ exit
50
+ )
51
+
52
  if "%cuda_version%"=="11" (
53
  echo Installing PyTorch for CUDA 11.8...
54
  venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
 
61
  exit
62
  )
63
 
 
64
  echo Deleting potential conflicting files
65
  del convert-to-safetensors.py
66
  del download-model.py
67
  rmdir /s /q exllamav2
68
  del start-quant.sh
69
  del enter-venv.sh
70
+ rmdir /s /q flash-attention
71
 
72
  REM download stuff
73
  echo Downloading files...
 
85
  venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
86
  venv\scripts\python.exe -m pip install .\exllamav2
87
 
88
+ if "%flash_attention%"=="y" (
89
+ echo Installing flash-attention. Go watch some movies, this will take a while...
90
+ echo If failed, retry without flash-attention.
91
+ git clone https://github.com/Dao-AILab/flash-attention
92
+ venv\scripts\python.exe -m pip install .\flash-attention
93
+ rmdir /s /q flash-attention
94
+ )
95
+
96
  REM create start-quant-windows.bat
97
  echo @echo off > start-quant.bat
98
  echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
exl2-multi-quant-local/INSTRUCTIONS.txt CHANGED
@@ -1,14 +1,14 @@
1
  For NVIDIA cards install the CUDA toolkit
2
 
3
  Nvidia Maxwell or higher
4
- https://developer.nvidia.com/cuda-downloads
5
 
6
  Nvidia Kepler or higher
7
  https://developer.nvidia.com/cuda-11-8-0-download-archive
8
 
9
  Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
10
 
11
- Haven't done much testing but for Windows, Visual Studio 2019 with desktop development for C++ might be required.
12
  https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
13
  install the desktop development for C++ workload
14
 
@@ -19,11 +19,11 @@ For example, on Ubuntu use: sudo apt-get install build-essential
19
 
20
  This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
21
 
22
- Only python 3.8 - 3.11 is known to work. If you have a higher version of python, I can't guarantee that it will work.
23
 
24
 
25
 
26
- First setup your environment by using either windows.bat or linux.sh. If something fails during setup, then delete venv folder and try again.
27
 
28
  After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
29
 
@@ -32,7 +32,7 @@ Make sure to also have a lot of RAM depending on the model. Have noticed gemma t
32
 
33
  If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
34
 
35
- To add more options to the quantization process, you can add them to line 136. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
36
 
37
  Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
38
 
@@ -46,4 +46,4 @@ https://github.com/oobabooga
46
  Credit to Lucain Pouget for maintaining huggingface-hub.
47
  https://github.com/Wauplin
48
 
49
- Only tested with CUDA 12.1 on Windows 11
 
1
  For NVIDIA cards install the CUDA toolkit
2
 
3
  Nvidia Maxwell or higher
4
+ https://developer.nvidia.com/cuda-12-1-0-download-archive
5
 
6
  Nvidia Kepler or higher
7
  https://developer.nvidia.com/cuda-11-8-0-download-archive
8
 
9
  Restart your computer after installing the CUDA toolkit to make sure the PATH is set correctly.
10
 
11
+ Visual Studio with desktop development for C++ is required.
12
  https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=community&rel=16&utm_medium=microsoft&utm_campaign=download+from+relnotes&utm_content=vs2019ga+button
13
  install the desktop development for C++ workload
14
 
 
19
 
20
  This may work with AMD cards but only on linux and possibly WSL2. I can't guarantee that it will work on AMD cards, I personally don't have one to test with. You may need to install stuff before starting. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html
21
 
22
+ Only python 3.8 - 3.12 is known to work. If you have a higher/lower version of python, I can't guarantee that it will work.
23
 
24
 
25
 
26
+ First setup your environment by using either windows.bat or linux.sh.
27
 
28
  After setup is complete then you'll have a file called start-quant. Use this to run the quant script.
29
 
 
32
 
33
  If you close the terminal or the terminal crashes, check the last BPW it was on and enter the remaining quants you wanted. It should be able to pick up where it left off. Don't type the BPW of completed quants as it will start from the beginning. You may also use ctrl + c to pause at any time during the quant process.
34
 
35
+ To add more options to the quantization process, you can add them to line 140. All options: https://github.com/turboderp/exllamav2/blob/master/doc/convert.md
36
 
37
  Things may break in the future as it downloads the latest version of all the dependencies which may either change names or how they work. If something breaks, please open a discussion at https://huggingface.co/Anthonyg5005/hf-scripts/discussions
38
 
 
46
  Credit to Lucain Pouget for maintaining huggingface-hub.
47
  https://github.com/Wauplin
48
 
49
+ Only tested with CUDA 12.1 on Windows 11 and WSL2 Ubuntu 24.04
exl2-multi-quant-local/exl2-quant.py CHANGED
@@ -85,9 +85,13 @@ bpwvalue = list(qnum.values())
85
  bpwvalue.sort()
86
 
87
  #ask to delete fp16 after done
88
- delmodel = input("Do you want to delete the original model after finishing? (Won't delete if paused or failed) (y/n): ")
 
 
89
  while delmodel != 'y' and delmodel != 'n':
90
  delmodel = input("Please enter 'y' or 'n': ")
 
 
91
  if delmodel == 'y':
92
  print(f"Deleting dir models/{model} after quants are finished.")
93
  time.sleep(3)
 
85
  bpwvalue.sort()
86
 
87
  #ask to delete fp16 after done
88
+ delmodel = input("Do you want to delete the original model? (Won't delete if paused or failed) (y/N): ")
89
+ if delmodel == '':
90
+ delmodel = 'n'
91
  while delmodel != 'y' and delmodel != 'n':
92
  delmodel = input("Please enter 'y' or 'n': ")
93
+ if delmodel == '':
94
+ delmodel = 'n'
95
  if delmodel == 'y':
96
  print(f"Deleting dir models/{model} after quants are finished.")
97
  time.sleep(3)
exl2-multi-quant-local/linux-setup.sh CHANGED
@@ -4,11 +4,15 @@
4
 
5
  # check if "venv" subdirectory exists, if not, create one
6
  if [ ! -d "venv" ]; then
7
- python3 -m venv venv
8
  else
9
- echo "venv directory already exists. If something is broken, delete everything but exl2-quant.py and run this script again."
10
- read -p "Press enter to continue"
11
- exit
 
 
 
 
12
  fi
13
 
14
  # ask if the user has git installed
@@ -17,7 +21,9 @@ read -p "Do you have git and wget installed? (y/n) " gitwget
17
  if [ "$gitwget" = "y" ]; then
18
  echo "Setting up environment"
19
  else
20
- echo "Please install git and wget before running this script."
 
 
21
  read -p "Press enter to continue"
22
  exit
23
  fi
@@ -33,6 +39,15 @@ fi
33
  # if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
34
  read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
35
 
 
 
 
 
 
 
 
 
 
36
  if [ "$pytorch_version" = "11" ]; then
37
  echo "Installing PyTorch for CUDA 11.8"
38
  venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
@@ -54,6 +69,7 @@ rm download-model.py
54
  rm -rf exllamav2
55
  rm start-quant.sh
56
  rm enter-venv.sh
 
57
 
58
  # download stuff
59
  echo "Downloading files"
@@ -71,6 +87,14 @@ venv/bin/python -m pip install -r exllamav2/requirements.txt
71
  venv/bin/python -m pip install huggingface-hub transformers accelerate
72
  venv/bin/python -m pip install ./exllamav2
73
 
 
 
 
 
 
 
 
 
74
  # create start-quant.sh
75
  echo "#!/bin/bash" > start-quant.sh
76
  echo "venv/bin/python exl2-quant.py" >> start-quant.sh
 
4
 
5
  # check if "venv" subdirectory exists, if not, create one
6
  if [ ! -d "venv" ]; then
7
+ python -m venv venv
8
  else
9
+ read -p "venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) " reinst
10
+ if [ "$reinst" = "y" ]; then
11
+ rm -rf venv
12
+ python -m venv venv
13
+ else
14
+ exit
15
+ fi
16
  fi
17
 
18
  # ask if the user has git installed
 
21
  if [ "$gitwget" = "y" ]; then
22
  echo "Setting up environment"
23
  else
24
+ echo "Please install git and wget from your distro's package manager before running this script."
25
+ echo "Example for Debian-based: sudo apt-get install git wget"
26
+ echo "Example for Arch-based: sudo pacman -S git wget"
27
  read -p "Press enter to continue"
28
  exit
29
  fi
 
39
  # if CUDA version 12 install pytorch for 12.1, else if CUDA 11 install pytorch for 11.8. If ROCm, install pytorch for ROCm 5.7
40
  read -p "Please enter your GPU compute version, CUDA 11/12 or AMD ROCm (11, 12, rocm): " pytorch_version
41
 
42
+ # ask to install flash attention
43
+ echo "Flash attention is a feature that could fix overflow issues on some more broken models."
44
+ read -p "Would you like to install flash-attention? (rarely needed and optional) (y/n) " flash_attention
45
+ if [ "$flash_attention" != "y" ] && [ "$flash_attention" != "n" ]; then
46
+ echo "Invalid input. Please enter y or n."
47
+ read -p "Press enter to continue"
48
+ exit
49
+ fi
50
+
51
  if [ "$pytorch_version" = "11" ]; then
52
  echo "Installing PyTorch for CUDA 11.8"
53
  venv/bin/python -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
 
69
  rm -rf exllamav2
70
  rm start-quant.sh
71
  rm enter-venv.sh
72
+ rm -rf flash-attention
73
 
74
  # download stuff
75
  echo "Downloading files"
 
87
  venv/bin/python -m pip install huggingface-hub transformers accelerate
88
  venv/bin/python -m pip install ./exllamav2
89
 
90
+ if [ "$flash_attention" = "y" ]; then
91
+ echo "Installing flash-attention..."
92
+ echo "If failed, retry without flash-attention."
93
+ git clone https://github.com/Dao-AILab/flash-attention
94
+ venv/bin/python -m pip install ./flash-attention
95
+ rm -rf flash-attention
96
+ fi
97
+
98
  # create start-quant.sh
99
  echo "#!/bin/bash" > start-quant.sh
100
  echo "venv/bin/python exl2-quant.py" >> start-quant.sh
exl2-multi-quant-local/windows-setup.bat CHANGED
@@ -6,8 +6,12 @@ REM check if "venv" subdirectory exists, if not, create one
6
  if not exist "venv\" (
7
  python -m venv venv
8
  ) else (
9
- echo venv directory already exists. If something is broken, delete everything but exl2-quant.py and run this script again.
10
- pause
 
 
 
 
11
  exit
12
  )
13
 
@@ -36,6 +40,15 @@ echo CUDA compilers:
36
  where nvcc
37
  set /p cuda_version="Please enter your CUDA version (11 or 12): "
38
 
 
 
 
 
 
 
 
 
 
39
  if "%cuda_version%"=="11" (
40
  echo Installing PyTorch for CUDA 11.8...
41
  venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
@@ -54,6 +67,7 @@ del download-model.py
54
  rmdir /s /q exllamav2
55
  del start-quant.sh
56
  del enter-venv.sh
 
57
 
58
  REM download stuff
59
  echo Downloading files...
@@ -71,6 +85,14 @@ venv\scripts\python.exe -m pip install -r exllamav2/requirements.txt
71
  venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
72
  venv\scripts\python.exe -m pip install .\exllamav2
73
 
 
 
 
 
 
 
 
 
74
  REM create start-quant-windows.bat
75
  echo @echo off > start-quant.bat
76
  echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat
 
6
  if not exist "venv\" (
7
  python -m venv venv
8
  ) else (
9
+ set /p reinst="venv directory already exists. Looking to upgrade/reinstall exllama? (will reinstall python venv) (y/n) "
10
+ )
11
+ if "%reinst%"=="y" (
12
+ rmdir /s /q venv
13
+ python -m venv venv
14
+ ) else (
15
  exit
16
  )
17
 
 
40
  where nvcc
41
  set /p cuda_version="Please enter your CUDA version (11 or 12): "
42
 
43
+ REM ask to install flash attention
44
+ echo Flash attention is a feature that could fix overflow issues on some more broken models. However it will increase install time by a few hours.
45
+ set /p flash_attention="Would you like to install flash-attention? (rarely needed and optional) (y/n) "
46
+ if not "%flash_attention%"=="y" if not "%flash_attention%"=="n" (
47
+ echo Invalid input. Please enter y or n.
48
+ pause
49
+ exit
50
+ )
51
+
52
  if "%cuda_version%"=="11" (
53
  echo Installing PyTorch for CUDA 11.8...
54
  venv\scripts\python.exe -m pip install torch --index-url https://download.pytorch.org/whl/cu118 --upgrade
 
67
  rmdir /s /q exllamav2
68
  del start-quant.sh
69
  del enter-venv.sh
70
+ rmdir /s /q flash-attention
71
 
72
  REM download stuff
73
  echo Downloading files...
 
85
  venv\scripts\python.exe -m pip install huggingface-hub transformers accelerate
86
  venv\scripts\python.exe -m pip install .\exllamav2
87
 
88
+ if "%flash_attention%"=="y" (
89
+ echo Installing flash-attention. Go watch some movies, this will take a while...
90
+ echo If failed, retry without flash-attention.
91
+ git clone https://github.com/Dao-AILab/flash-attention
92
+ venv\scripts\python.exe -m pip install .\flash-attention
93
+ rmdir /s /q flash-attention
94
+ )
95
+
96
  REM create start-quant-windows.bat
97
  echo @echo off > start-quant.bat
98
  echo venv\scripts\python.exe exl2-quant.py >> start-quant.bat