davidshtian commited on
Commit
d880cf5
1 Parent(s): 61e54ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ pipeline_tag: text-generation
5
+ inference: false
6
+ tags:
7
+ - mistral
8
+ - inferentia2
9
+ - neuron
10
+ - neuronx
11
  license: apache-2.0
12
  ---
13
+ # Neuronx for [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) - Updated Mistral 7B Model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) Using AWS Neuron SDK version 2.18~
14
+
15
+ This model has been exported to the `neuron` format using specific `input_shapes` and `compiler` parameters detailed in the paragraphs below.
16
+
17
+ Please refer to the 🤗 `optimum-neuron` [documentation](https://huggingface.co/docs/optimum-neuron/main/en/guides/models#configuring-the-export-of-a-generative-model) for an explanation of these parameters.
18
+
19
+ Note: To compile the mistralai/Mistral-7B-Instruct-v0.2 on Inf2, you need to update the model config sliding_window (either file or model variable) from null to default 4096.
20
+
21
+ ## Usage with 🤗 `optimum-neuron`
22
+
23
+ ```python
24
+ >>> from optimum.neuron import pipeline
25
+
26
+ >>> p = pipeline('text-generation', 'davidshtian/Mistral-7B-Instruct-v0.2-neuron-1x2048-2-cores-2.18')
27
+ >>> p("My favorite place on earth is", max_new_tokens=64, do_sample=True, top_k=50)
28
+ [{'generated_text': "My favorite place on earth is probably Paris, France, and if I were to go there
29
+ now I would take my partner on a romantic getaway where we could lay on the grass in the park,
30
+ eat delicious French cheeses and wine, and watch the sunset on the Seine river.'"}]
31
+ ```
32
+
33
+ This repository contains tags specific to versions of `neuronx`. When using with 🤗 `optimum-neuron`, use the repo revision specific to the version of `neuronx` you are using, to load the right serialized checkpoints.
34
+
35
+ ## Arguments passed during export
36
+
37
+ **input_shapes**
38
+
39
+ ```json
40
+ {
41
+ "batch_size": 1,
42
+ "sequence_length": 2048,
43
+ }
44
+ ```
45
+
46
+ **compiler_args**
47
+
48
+ ```json
49
+ {
50
+ "auto_cast_type": "bf16",
51
+ "num_cores": 2,
52
+ }
53
+ ```