xmadai
/

Llama-3.1-8B-Instruct-xMADai-INT4

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Oscar Wu commited on 27 days ago

Commit

94b73f6

•

1 Parent(s): 1b33db0

Updated README

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -32,12 +32,17 @@ This repository contains [`meta-llama/Llama-3.1-8B-Instruct`](https://huggingfac
 Loading the model checkpoint of this xMADified model requires less than 6 GiB of VRAM. Hence it can be efficiently run on a 8 GB GPU.
-**Package prerequisites**: Run the following commands to install the required packages.
 ```bash
-pip install torch==2.4.0 transformers accelerate optimum
 pip install -vvv --no-build-isolation "git+https://github.com/PanQiWei/AutoGPTQ.git@v0.7.1"
 ```
 **Sample Inference Code**
 ```python

 Loading the model checkpoint of this xMADified model requires less than 6 GiB of VRAM. Hence it can be efficiently run on a 8 GB GPU.
+**Package prerequisites**:
+1. Run the following *commands to install the required packages.
 ```bash
+pip install torch==2.4.0  # Run following if you have CUDA version 11.8: pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
+pip install transformers accelerate optimum
 pip install -vvv --no-build-isolation "git+https://github.com/PanQiWei/AutoGPTQ.git@v0.7.1"
 ```
 **Sample Inference Code**
 ```python