microsoft/Phi-3-mini-128k-instruct

I've run Phi-3-mini-4k/128k-instruct models and get slow responses (~2 tokens per second) despite gpu access (although maxes out at 50%??) I am running it via the text-generation-webui jetson docker container on a Jetson Orin Nano. My settings are below, is there something I need to change?

(You: <|user|>
I am going to Rotterdam, what should I see?<|end|>
<|assistant|>
<|end|>
<|assistant|>
AI:
18:09:30-897515 INFO GENERATE_PARAMS=
{ 'max_new_tokens': 512,
'temperature': 0.7,
'temperature_last': False,
'dynamic_temperature': False,
'dynatemp_low': 1,
'dynatemp_high': 1,
'dynatemp_exponent': 1,
'top_p': 0.9,
'min_p': 0,
'top_k': 20,
'repetition_penalty': 1.15,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 1024,
'typical_p': 1,
'tfs': 1,
'top_a': 0,
'guidance_scale': 1,
'penalty_alpha': 0,
'mirostat_mode': 0,
'mirostat_tau': 5,
'mirostat_eta': 0.1,
'do_sample': True,
'encoder_repetition_penalty': 1,
'no_repeat_ngram_size': 0,
'min_length': 0,
'num_beams': 1,
'length_penalty': 1,
'early_stopping': False,
'use_cache': True,
'eos_token_id': [32000],
'stopping_criteria': [ <modules.callbacks._StopEverythingStoppingCriteria object at 0xffff2720c5b0>],
'logits_processor': []}

Output generated in 120.04 seconds (2.24 tokens/s, 269 tokens, context 668, seed 1612027793)

microsoft
/

Phi-3-mini-128k-instruct

Slow?