Update README.md
Browse files
README.md
CHANGED
@@ -63,13 +63,21 @@ Features of this architecture:
|
|
63 |
|
64 |
### Step 1: Environment Setup
|
65 |
|
66 |
-
Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies,
|
|
|
|
|
67 |
|
68 |
```
|
69 |
wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
|
70 |
bash setup.sh
|
71 |
```
|
72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
|
74 |
### Step 2: Chat with Hymba-1.5B-Instruct
|
75 |
After setting up the environment, you can use the following script to chat with our Model
|
@@ -99,7 +107,7 @@ stopping_criteria = StoppingCriteriaList([StopStringCriteria(tokenizer=tokenizer
|
|
99 |
outputs = model.generate(
|
100 |
tokenized_chat,
|
101 |
max_new_tokens=256,
|
102 |
-
do_sample=
|
103 |
temperature=0.7,
|
104 |
use_cache=True,
|
105 |
stopping_criteria=stopping_criteria
|
|
|
63 |
|
64 |
### Step 1: Environment Setup
|
65 |
|
66 |
+
Since Hymba-1.5B-Instruct employs [FlexAttention](https://pytorch.org/blog/flexattention/), which relies on Pytorch2.5 and other related dependencies, we provide two ways to setup the environment:
|
67 |
+
|
68 |
+
- **[Local install]** Install the related packages using our provided `setup.sh` (support CUDA 12.1/12.4):
|
69 |
|
70 |
```
|
71 |
wget --header="Authorization: Bearer YOUR_HF_TOKEN" https://huggingface.co/nvidia/Hymba-1.5B-Base/resolve/main/setup.sh
|
72 |
bash setup.sh
|
73 |
```
|
74 |
|
75 |
+
- **[Docker]** A docker image is provided with all of Hymba's dependencies installed. You can download our docker image and start a container using the following commands:
|
76 |
+
```
|
77 |
+
docker pull ghcr.io/tilmto/hymba:v1
|
78 |
+
docker run --gpus all -v /home/$USER:/home/$USER -it ghcr.io/tilmto/hymba:v1 bash
|
79 |
+
```
|
80 |
+
|
81 |
|
82 |
### Step 2: Chat with Hymba-1.5B-Instruct
|
83 |
After setting up the environment, you can use the following script to chat with our Model
|
|
|
107 |
outputs = model.generate(
|
108 |
tokenized_chat,
|
109 |
max_new_tokens=256,
|
110 |
+
do_sample=False,
|
111 |
temperature=0.7,
|
112 |
use_cache=True,
|
113 |
stopping_criteria=stopping_criteria
|