davidshtian commited on
Commit
268ce5b
•
1 Parent(s): 932ea55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md CHANGED
@@ -1,3 +1,121 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-generation
5
+ inference: false
6
+ tags:
7
+ - mistral
8
+ - inferentia2
9
+ - neuron
10
+ - neuronx
11
  license: apache-2.0
12
  ---
13
+ # Neuronx for [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) - Updated Mistral 7B Model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) Using AWS Neuron SDK version 2.18~
14
+
15
+ This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
16
+
17
+ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
18
+
19
+ Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
20
+
21
+ ## Usage with 🤗 `TGI`
22
+ Refer to container image on [neuronx-tgi](https://gallery.ecr.aws/shtian/neuronx-tgi) Amazon ECR Public Gallery.
23
+ ```shell
24
+ export HF_TOKEN="hf_xxx"
25
+
26
+ docker run -d -p 8080:80 \
27
+ --name mistral-7b-neuronx-tgi \
28
+ -v $(pwd)/data:/data \
29
+ --device=/dev/neuron0 \
30
+ --device=/dev/neuron1 \
31
+ --device=/dev/neuron2 \
32
+ --device=/dev/neuron3 \
33
+ --device=/dev/neuron4 \
34
+ --device=/dev/neuron5 \
35
+ --device=/dev/neuron6 \
36
+ --device=/dev/neuron7 \
37
+ --device=/dev/neuron8 \
38
+ --device=/dev/neuron9 \
39
+ --device=/dev/neuron10 \
40
+ --device=/dev/neuron11 \
41
+ -e HF_TOKEN=${HF_TOKEN} \
42
+ public.ecr.aws/shtian/neuronx-tgi:latest \
43
+ --model-id davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-24-cores-2.18 \
44
+ --max-batch-size 1 \
45
+ --max-input-length 16 \
46
+ --max-total-tokens 32
47
+
48
+ curl 127.0.0.1:8080/generate \
49
+ -X POST \
50
+ -d '{"inputs":"Who are you?","parameters":{"max_new_tokens":16}}' \
51
+ -H 'Content-Type: application/json'
52
+ ```
53
+
54
+ ## Usage with 🤗 `optimum-neuron pipeline`
55
+
56
+ ```python
57
+ from optimum.neuron import pipeline
58
+
59
+ p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-24-cores-2.18')
60
+ p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
61
+
62
+ [{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
63
+ now I would take my partner on a romantic getaway where we could lay on the grass in the park,
64
+ eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
65
+ ```
66
+
67
+ ## Usage with 🤗 `optimum-neuron NeuronModelForCausalLM`
68
+
69
+ ```python
70
+ import torch
71
+ from transformers import AutoTokenizer
72
+ from optimum.neuron import NeuronModelForCausalLM
73
+
74
+ model = NeuronModelForCausalLM.from_pretrained("davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-24-cores-2.18")
75
+
76
+ tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
77
+ tokenizer.pad_token_id = tokenizer.eos_token_id
78
+
79
+ def model_sample(input_prompt):
80
+ input_prompt = "[INST] " + input_prompt + " [/INST]"
81
+
82
+ tokens = tokenizer(input_prompt, return_tensors="pt")
83
+
84
+ with torch.inference_mode():
85
+ sample_output = model.generate(
86
+ **tokens,
87
+ do_sample=True,
88
+ min_length=16,
89
+ max_length=32,
90
+ temperature=0.5,
91
+ pad_token_id=tokenizer.eos_token_id
92
+ )
93
+ outputs = [tokenizer.decode(tok, skip_special_tokens=True) for tok in sample_output]
94
+
95
+ res = outputs[0].split('[/INST]')[1].strip("</s>").strip()
96
+ return(res + "\n")
97
+
98
+ print(model_sample("how are you today?"))
99
+ ```
100
+
101
+ This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
102
+
103
+ ## Arguments passed during export
104
+
105
+ **input_shapes**
106
+
107
+ ```json
108
+ {
109
+ "batch_size": 1,
110
+ "sequence_length": 2048,
111
+ }
112
+ ```
113
+
114
+ **compiler_args**
115
+
116
+ ```json
117
+ {
118
+ "auto_cast_type": "bf16",
119
+ "num_cores": 24,
120
+ }
121
+ ```