ibm-fms
/

llama2-70b-accelerator

Inference Endpoints

Model card Files Files and versions Community

sahilsuneja commited on Jul 24, 2024

Commit

14f26be

·

verified ·

1 Parent(s): 7bb249d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -106,7 +106,7 @@ _Note: first prompt may be slower as there is a slight warmup time_
 #### start the server
 ```bash
-model=ibm-fms/llama3-8b-accelerator
 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
 docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model

 #### start the server
 ```bash
+model=ibm-fms/llama2-70b-accelerator
 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
 docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model