Commit
•
230a504
1
Parent(s):
f248b35
Update app.py (#3)
Browse files- Update app.py (d1e2541a9d1f605eb0338a3b3a439a59bea4cf98)
- Update app.py (3d4b5d4606c6ca8bab7fa45aa49bde790f1ecf8e)
Co-authored-by: Haotian Liu <liuhaotian@users.noreply.huggingface.co>
app.py
CHANGED
@@ -342,12 +342,13 @@ title_markdown = """
|
|
342 |
|
343 |
ONLY WORKS WITH GPU!
|
344 |
|
345 |
-
You can load the model with
|
|
|
346 |
|
347 |
Recommended configurations:
|
348 |
-
| Hardware |
|
349 |
-
|
350 |
-
| **Bits** |
|
351 |
|
352 |
"""
|
353 |
|
|
|
342 |
|
343 |
ONLY WORKS WITH GPU!
|
344 |
|
345 |
+
You can load the model with 4-bit or 8-bit quantization to make it fit in smaller hardwares. Setting the environment variable `bits` to control the quantization.
|
346 |
+
*Note: 8-bit seems to be slower than both 4-bit/16-bit. Although it has enough VRAM to support 8-bit, until we figure out the inference speed issue, we recommend 4-bit for A10G for the best efficiency.*
|
347 |
|
348 |
Recommended configurations:
|
349 |
+
| Hardware | T4-Small (16G) | A10G-Small (24G) | A100-Large (40G) |
|
350 |
+
|-------------------|-----------------|------------------|------------------|
|
351 |
+
| **Bits** | 4 (default) | 4 | 16 |
|
352 |
|
353 |
"""
|
354 |
|