NeMo
jiaqiz commited on
Commit
1c7a68e
1 Parent(s): 5738a00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -104,7 +104,7 @@ Deployment and inference with Nemotron-4-340B-Instruct can be done in three step
104
 
105
  Create a Python script to interact with the deployed model.
106
  Create a Bash script to start the inference server
107
- Schedule a Slurm job to distribute the model across 4 nodes and associate them with the inference server.
108
 
109
  1. Define the Python script ``call_server.py``
110
 
@@ -154,7 +154,7 @@ if response.endswith("<extra_id_1>"):
154
  print(response)
155
  ```
156
 
157
- 2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (docker pull nvcr.io/nvidia/nemo:24.01.framework) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
158
 
159
  ```
160
  NEMO_FILE=$1
@@ -204,7 +204,7 @@ depends_on () {
204
  ```
205
 
206
 
207
- 3. Launch ``nemo_inference.sh`` with a Slurm script defined like below, which starts a 4-node job for model inference.
208
 
209
  ```
210
  #!/bin/bash
 
104
 
105
  Create a Python script to interact with the deployed model.
106
  Create a Bash script to start the inference server
107
+ Schedule a Slurm job to distribute the model across 2 nodes and associate them with the inference server.
108
 
109
  1. Define the Python script ``call_server.py``
110
 
 
154
  print(response)
155
  ```
156
 
157
+ 2. Given this Python script, create a Bash script which spins up the inference server within the [NeMo container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) (```docker pull nvcr.io/nvidia/nemo:24.01.framework```) and calls the Python script ``call_server.py``. The Bash script ``nemo_inference.sh`` is as follows,
158
 
159
  ```
160
  NEMO_FILE=$1
 
204
  ```
205
 
206
 
207
+ 3. Launch ``nemo_inference.sh`` with a Slurm script defined like below, which starts a 2-node job for model inference.
208
 
209
  ```
210
  #!/bin/bash