ibm-fms
/

llama-13b-accelerator

Inference Endpoints

Model card Files Files and versions Community

JRosenkranz commited on Apr 5, 2024

Commit

924f16b

·

verified ·

1 Parent(s): 0699e13

updated readme with samples

Files changed (1) hide show

README.md +70 -1

README.md CHANGED Viewed

@@ -1,3 +1,72 @@
 ---
 license: llama2
----

 ---
 license: llama2
+---
+To try this out running in a production-like environment, please use the pre-built docker image:
+```bash
+docker pull docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7 docker run -d --rm --gpus all \
+    --name my-tgis-server \
+    -v /path/to/all/models:/models \
+    -e MODEL_NAME=/models/model_weights/llama/13B-F \
+    -e SPECULATOR_PATH=/models/speculator_weights/llama/13B-F \
+    -e FLASH_ATTENTION=true \
+    -e PAGED_ATTENTION=true \
+    -e DTYPE_STR=float16 \
+    docker-eu-public.artifactory.swg-devops.com/res-zrl-snap-docker-local/tgis-os:spec.7
+docker logs my-tgis-server -f
+docker exec -it my-tgis-server python /path-to-example-code/sample_client.py
+```
+To try this out with the fms-native compiled model, please execute the following:
+#### batch_size=1 (compile + cudagraphs)
+```bash
+git clone https://github.com/foundation-model-stack/fms-extras
+(cd fms-extras && pip install -e .)
+pip install transformers==4.35.0 sentencepiece numpy
+python fms-extras/scripts/paged_speculative_inference.py \
+    --variant=13b \
+    --model_path=/path/to/model_weights/llama/13B-F \
+    --model_source=hf \
+    --tokenizer=/path/to/llama/13B-F \
+    --speculator_path=/path/to/speculator_weights/llama/13B-F \
+    --speculator_source=hf \
+    --compile \
+    --compile_mode=reduce-overhead
+```
+#### batch_size=1 (compile)
+```bash
+git clone https://github.com/foundation-model-stack/fms-extras
+(cd fms-extras && pip install -e .)
+pip install transformers==4.35.0 sentencepiece numpy
+python fms-extras/scripts/paged_speculative_inference.py \
+    --variant=13b \
+    --model_path=/path/to/model_weights/llama/13B-F \
+    --model_source=hf \
+    --tokenizer=/path/to/llama/13B-F \
+    --speculator_path=/path/to/speculator_weights/llama/13B-F \
+    --speculator_source=hf \
+    --compile \
+```
+#### batch_size=4 (compile)
+```bash
+git clone https://github.com/foundation-model-stack/fms-extras
+(cd fms-extras && pip install -e .)
+pip install transformers==4.35.0 sentencepiece numpy
+python fms-extras/scripts/paged_speculative_inference.py \
+    --variant=13b \
+    --model_path=/path/to/model_weights/llama/13B-F \
+    --model_source=hf \
+    --tokenizer=/path/to/llama/13B-F \
+    --speculator_path=/path/to/speculator_weights/llama/13B-F \
+    --speculator_source=hf \
+    --batch_input \
+    --compile \
+```