ZLUDA (CUDA Wrapper) for AMD GPUs in Windows

### Warning

ZLUDA does not fully support PyTorch in its official build. So ZLUDA support is so tricky and unstable. Support is limited at this time.
Please don't create issues regarding ZLUDA on github. Feel free to reach out via the ZLUDA thread in the help channel on discord.

## Installing ZLUDA for AMD GPUs in Windows.

#### Note
_This guide assumes you have [git and python](https://github.com/vladmandic/automatic/wiki/Installation#install-python-and-git) installed, have used SD.Next before, and are comfortable using the command prompt, navigating Windows Explorer, renaming files and folders, and working with zip files._

#### Compatible GPUs
A list of compatible GPUs can be found [here](https://rocm.docs.amd.com/projects/install-on-windows/en/develop/reference/system-requirements.html).
If your GPU is not on the list, you may need to build your own roclabs,please follow the instructions in [Rocm Support guide](https://github.com/vladmandic/automatic/wiki/Rocm-Support).  
_(Note: including intergrated GPUs)_  

Note: If you have an integrated GPU (iGPU), you may need to disable it, or use the `HIP_VISIBLE_DEVICES` environment variable. Learn more [here](https://github.com/vosen/ZLUDA?tab=readme-ov-file#hardware).

### Install Visual C++ Runtime
_Note: Most everyone would have this anyway, since it comes with a lot of games, but there's no harm in trying to install it._  

Grab the latest version of Visual C++ Runtime from https://aka.ms/vs/17/release/vc_redist.x64.exe (this is a direct download link) and then run it.  
If you get the options to Repair or Uninstall, then you already have it installed and can click Close. Otherwise, install it.  

### Install ZLUDA
ZLUDA is now auto-installed, and automatically added to PATH, when starting webui.bat with `--use-zluda`.

### Install HIP SDK
Install HIP SDK 5.7 from https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html  
So long as your regular AMD GPU driver is up to date, you don't need to install the PRO driver HIP SDK suggests.

### Add folders to PATH
Add `%HIP_PATH%bin` to your PATH.  
https://github.com/brknsoul/ROCmLibs/wiki/Adding-folders-to-PATH  

_Note: `%HIP_PATH%bin` typically relates to `"C:\Program Files\AMD\ROCm\5.7\bin"`, assuming Windows is installed on C:._


### Replace HIP SDK library files for GPU architectures gfx1031 and gfx1032
Go to https://rocm.docs.amd.com/projects/install-on-windows/en/develop/reference/system-requirements.html and find your GPU model.  
If your GPU model has a ✅ in both columns then skip to [Compilation, Settings, and First Generation](https://github.com/vladmandic/automatic/wiki/ZLUDA#compilation-settings-and-first-generation).    
If your GPU model has an ❌ in the HIP SDK column (LLVM targets gfx1031 and gfx1032) follow the instructions below;  

1. Go to `%HIP_PATH%bin\rocblas`
2. Rename `library` to something else, like `origlibrary`
3. Download [ROCmLibs.zip](https://github.com/brknsoul/ROCmLibs/raw/main/ROCmLibs.zip?download=)  
  i. Alternative: If you have a 6600 or 6600xt (gfx1032) GPU, give [Optimised_ROCmLibs_gfx1032.7z](https://github.com/brknsoul/ROCmLibs/raw/main/Optimised_ROCmLibs_gfx1032.7z?download=) a go. It seems to be about 50% faster. Thanks FremontDango!  
4. Open the zip file.
5. Drag and drop the `library` folder from ROCmLibs.zip into `%HIP_PATH%bin\rocblas` (The folder you opened in step 1).
6. Reboot PC

If your GPU model not in the HIP SDK column or not availalbe in the above list, follow the instructions in [Rocm Support guide](https://github.com/vladmandic/automatic/wiki/Rocm-Support) to build your own RocblasLibs; 

### Install or Update SD.Next
Install SD.Next;

`git clone https://github.com/vladmandic/automatic`  
then  
`cd automatic`  
then  
`webui.bat --use-zluda --debug --autolaunch`
<br /><br />

or Update SD.Next

(from a current SD.Next install folder)  
`venv\scripts\activate`  
`pip uninstall -y torch-directml torch`  
`deactivate`  
`git pull`  
`webui.bat --use-zluda --debug --autolaunch --reinstall`  
_(after running successfully once, you can remove `--reinstall`)_  

_Note: ZLUDA functions best in Diffusers Backend, where certain Diffusers-only options are available_  

### Compilation, Settings, and First Generation
After the UI starts, head on over to System Tab > Compute Settings  
Set "Attention optimization method" to "Dynamic Attention BMM"  
Now, try to generate something.  
This should take a fair while (10-15mins, or even longer; some reports state over an hour) to compile, but this compilation should only need to be done once.  
Note: There will be no progress bar, as this is done by ZLUDA and not SD.Next. Eventually your image will start generating.

## Comparison (DirectML)

|             | DirectML | ZLUDA  |
|-------------|----------|--------|
| Speed       | Slower   | Faster |
| VRAM usage  | More     | Less   |
| VRAM GC     | ❌        | ✅      |
| Traning     | *        | ✅      |
| Flash Attention | ❌   | ❌      |
| FFT         | ❓        | ✅      |
| FFTW        | ❓        | ❌      |
| DNN         | ❓        | 🚧      |
| RTC         | ❓        | ❌      |
| Source Code | Closed   | Opened |
| Python | <=3.10 | Same as CUDA |

*: Known as possible, but uses too much VRAM to train stable diffusion models/LoRAs/etc.

## Compatibility

| DTYPE |            |
|-------|------------|
| FP64  | ✅          |
| FP32  | ✅          |
| FP16  | ✅          |
| BF16  | ✅          |
| LONG  | ✅          |
| INT8  | ✅*         |
| UINT8 | ✅*         |
| INT4  | ❓          |
| FP8   | ❌          |
| BF8   | ❌          |

*: Not tested.

***

## Experimental Settings

#### Sections below are _optional_ and _highly experimental_, and aren't required to start generating images. Ensure you can generate images first _before_ trying these.

### Experimental Speed Increase Using deepcache (optional)

Start SD.Next, head on over to System Tab > Compute Settings.  
Scroll down to "Model Compile" and tick the 'Model', 'VAE', and 'Text Encoder' boxes.  
Select "deep-cache" as your Model compile backend.  
Apply and Shutdown, and restart SD.Next.

### Enabling ZLUDA DNN (partial support)

This is PARTIAL and INCOMPLETE support of a performance library, ZLUDA DNN. Most of the cases, trying this will waste your time.

1. Checkout `dev` branch.
2. Download `ZLUDA.zip` from [v3.7-pre5-dnn](https://github.com/lshqqytiger/ZLUDA/releases/tag/v3.7-pre5-dnn), and unpack it upon your ZLUDA folder. (default: `path/to/sd.next/.zluda`)
3. Replace `path/to/sd.next/venv/Lib/site-packages/torch/lib/cudnn64_8.dll` with `path/to/sd.next/.zluda/cudnn.dll`. (rename `cudnn.dll` and overwrite `torch/lib/cudnn64_8.dll`)
4. Download `5.7.zip` from the same release, and unpack it upon your HIP SDK folder. (`%HIP_PATH%`)
5. Tick `Enable ZLUDA DNN`, and restart webui.

#### Simple Comparison

RX 7900 XTX, not a optimal settings.

- without DNN: 16.66 ~ 16.9it/s
- with DNN: 17.1 ~ 17.4it/s

---

### If you get a "ZLUDA: failed to automatically patch torch" error;  
- Manually download the ZLUDA release marked with a green "latest" tag from https://github.com/lshqqytiger/ZLUDA/releases/  
- Unzip it somewhere, like `C:\ZLUDA`  
- Add `C:\ZLUDA` to your PATH following [this](https://github.com/brknsoul/ROCmLibs/wiki/Adding-folders-to-PATH) guide.  
  - *If there's a ZLUDA path there already, be sure to remove it, before closing the dialog box*  
- Manually copy and rename the following files from `C:\ZLUDA` to `[SD.Next Install Folder]\venv\Lib\site-packages\torch\lib`, overwriting the originals;  
     cublas.dll ->  cublas64_11.dll  
     cusparse.dll ->  cusparse64_11.dll  
     nvrtc.dll ->  nvrtc64_112_0.dll