wwwaj commited on
Commit
40a689c
1 Parent(s): f054138

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -127,7 +127,7 @@ print(output[0]['generated_text'])
127
  Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
128
 
129
  + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
130
- + Optimized inference: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
131
 
132
  ## Responsible AI Considerations
133
 
 
127
  Note that by default the model use flash attention which requires certain types of GPU to run. If you want to run the model on:
128
 
129
  + V100 or earlier generation GPUs: call `AutoModelForCausalLM.from_pretrained()` with `attn_implementation="eager"`
130
+ + Optimized inference on GPU, CPU, and Mobile: use the **ONNX** models [128K](https://aka.ms/phi3-mini-128k-instruct-onnx)
131
 
132
  ## Responsible AI Considerations
133