Update README.md
Browse files
README.md
CHANGED
@@ -21,18 +21,21 @@ See the "No Enough Memory" section below if you do not have enough memory.
|
|
21 |
```
|
22 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights
|
23 |
```
|
|
|
24 |
|
25 |
#### Multiple GPUs
|
26 |
You can use model parallelism to aggregate GPU memory from multiple GPUs on the same machine.
|
27 |
```
|
28 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --num-gpus 2
|
29 |
```
|
|
|
30 |
|
31 |
#### CPU Only
|
32 |
This runs on the CPU only and does not require GPU. It requires around 60GB of CPU memory for Vicuna-13B and around 30GB of CPU memory for Vicuna-7B.
|
33 |
```
|
34 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device cpu
|
35 |
```
|
|
|
36 |
|
37 |
#### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
|
38 |
Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
|
@@ -40,6 +43,8 @@ Use `--load-8bit` to turn on 8-bit compression.
|
|
40 |
```
|
41 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device mps --load-8bit
|
42 |
```
|
|
|
|
|
43 |
Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.
|
44 |
|
45 |
|
@@ -52,6 +57,7 @@ Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/V100(16GB
|
|
52 |
```
|
53 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --load-8bit
|
54 |
```
|
|
|
55 |
|
56 |
Besides, we are actively exploring more methods to make the model easier to run on more platforms.
|
57 |
Contributions and pull requests are welcome.
|
@@ -73,6 +79,8 @@ This controller manages the distributed workers.
|
|
73 |
```bash
|
74 |
python3 -m fastchat.serve.model_worker --model-path /path/to/vicuna/weights
|
75 |
```
|
|
|
|
|
76 |
Wait until the process finishes loading the model and you see "Uvicorn running on ...". You can launch multiple model workers to serve multiple models concurrently. The model worker will connect to the controller automatically.
|
77 |
|
78 |
To ensure that your model worker is connected to your controller properly, send a test message using the following command:
|
|
|
21 |
```
|
22 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights
|
23 |
```
|
24 |
+
When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
|
25 |
|
26 |
#### Multiple GPUs
|
27 |
You can use model parallelism to aggregate GPU memory from multiple GPUs on the same machine.
|
28 |
```
|
29 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --num-gpus 2
|
30 |
```
|
31 |
+
When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
|
32 |
|
33 |
#### CPU Only
|
34 |
This runs on the CPU only and does not require GPU. It requires around 60GB of CPU memory for Vicuna-13B and around 30GB of CPU memory for Vicuna-7B.
|
35 |
```
|
36 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device cpu
|
37 |
```
|
38 |
+
When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
|
39 |
|
40 |
#### Metal Backend (Mac Computers with Apple Silicon or AMD GPUs)
|
41 |
Use `--device mps` to enable GPU acceleration on Mac computers (requires torch >= 2.0).
|
|
|
43 |
```
|
44 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --device mps --load-8bit
|
45 |
```
|
46 |
+
When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
|
47 |
+
|
48 |
Vicuna-7B can run on a 32GB M1 Macbook with 1 - 2 words / second.
|
49 |
|
50 |
|
|
|
57 |
```
|
58 |
python3 -m fastchat.serve.cli --model-path /path/to/vicuna/weights --load-8bit
|
59 |
```
|
60 |
+
When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
|
61 |
|
62 |
Besides, we are actively exploring more methods to make the model easier to run on more platforms.
|
63 |
Contributions and pull requests are welcome.
|
|
|
79 |
```bash
|
80 |
python3 -m fastchat.serve.model_worker --model-path /path/to/vicuna/weights
|
81 |
```
|
82 |
+
When use huggingface, the `/path/to/vicuna/weights` is `jinxuewen/vicuna-13b`
|
83 |
+
|
84 |
Wait until the process finishes loading the model and you see "Uvicorn running on ...". You can launch multiple model workers to serve multiple models concurrently. The model worker will connect to the controller automatically.
|
85 |
|
86 |
To ensure that your model worker is connected to your controller properly, send a test message using the following command:
|