NeMo
English
nvidia
steerlm
reward model

multi node

#2
by hanlu0929 - opened

python /opt/NeMo-Aligner/examples/nlp/gpt/serve_reward_model.py
rm_model_file=Nemotron-4-340B-Reward
trainer.num_nodes=2
trainer.devices=8 \

help
How to configure the environment for 2 node

NVIDIA org

Hi @hanlu0929 , it depends on your environment/setup but if you're using a SLURM environment, you also need to make sure your job allocation has -N 2 nodes (of 8 * 80GB GPUs each) in the sbatch/srun command in addition to this pythoncommand.

Please let us know more details about your environment so that we can provide more guidance or redirect the question to our colleagues with experience working with different settings outside of SLURM. If this information is potentially sensitive, please email it to zhilinw@nvidia.com and yidong@nvidia.com and we are happy to continue the conversation there.

Sign up or log in to comment