|
--- |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
inference: false |
|
tags: |
|
- mistral |
|
- inferentia2 |
|
- neuron |
|
- neuronx |
|
license: apache-2.0 |
|
--- |
|
# Neuronx for [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) - Updated Mistral 7B Model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) Using AWS Neuron SDK version 2.18~ |
|
|
|
This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below. |
|
|
|
Please refer to the π€ `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters. |
|
|
|
Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096. |
|
|
|
## Usage with π€ `TGI` |
|
Refer to container image on [neuronx-tgi](https://gallery.ecr.aws/shtian/neuronx-tgi) Amazon ECR Public Gallery. |
|
```shell |
|
export HF_TOKEN="hf_xxx" |
|
|
|
docker run -d -p 8080:80 \ |
|
--name mistral-7b-neuronx-tgi \ |
|
-v $(pwd)/data:/data \ |
|
--device=/dev/neuron0 \ |
|
-e HF_TOKEN=${HF_TOKEN} \ |
|
public.ecr.aws/shtian/neuronx-tgi:latest \ |
|
--model-id davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18 \ |
|
--max-batch-size 1 \ |
|
--max-input-length 16 \ |
|
--max-total-tokens 32 |
|
|
|
curl 127.0.0.1:8080/generate \ |
|
-X POST \ |
|
-d '{"inputs":"Who are you?","parameters":{"max_new_tokens":16}}' \ |
|
-H 'Content-Type: application/json' |
|
``` |
|
|
|
## Usage with π€ `optimum-neuron pipeline` |
|
|
|
```python |
|
from optimum.neuron import pipeline |
|
|
|
p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18') |
|
p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50) |
|
|
|
[{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there |
|
now I would take my partner on a romantic getaway where we could lay on the grass in the park, |
|
eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}] |
|
``` |
|
|
|
## Usage with π€ `optimum-neuron NeuronModelForCausalLM` |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer |
|
from optimum.neuron import NeuronModelForCausalLM |
|
|
|
model = NeuronModelForCausalLM.from_pretrained("davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") |
|
tokenizer.pad_token_id = tokenizer.eos_token_id |
|
|
|
def model_sample(input_prompt): |
|
input_prompt = "[INST] " + input_prompt + " [/INST]" |
|
|
|
tokens = tokenizer(input_prompt, return_tensors="pt") |
|
|
|
with torch.inference_mode(): |
|
sample_output = model.generate( |
|
**tokens, |
|
do_sample=True, |
|
min_length=16, |
|
max_length=32, |
|
temperature=0.5, |
|
pad_token_id=tokenizer.eos_token_id |
|
) |
|
outputs = [tokenizer.decode(tok, skip_special_tokens=True) for tok in sample_output] |
|
|
|
res = outputs[0].split('[/INST]')[1].strip("</s>").strip() |
|
return(res + "\n") |
|
|
|
print(model_sample("how are you today?")) |
|
``` |
|
|
|
This repository contains tags specific to versions of `neuronx`. When using with π€ `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints. |
|
|
|
## Arguments passed during export |
|
|
|
**input_shapes** |
|
|
|
```json |
|
{ |
|
"batch_size": 1, |
|
"sequence_length": 2048, |
|
} |
|
``` |
|
|
|
**compiler_args** |
|
|
|
```json |
|
{ |
|
"auto_cast_type": "bf16", |
|
"num_cores": 2, |
|
} |
|
``` |