|
--- |
|
language: |
|
- en |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-2 |
|
- inferentia2 |
|
- neuron |
|
extra_gated_heading: Access Llama 2 on Hugging Face |
|
extra_gated_description: This is a form to enable access to Llama 2 on Hugging Face |
|
after you have been granted access from Meta. Please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads) |
|
and accept our license terms and acceptable use policy before submitting this form. |
|
Requests will be processed in 1-2 days. |
|
extra_gated_prompt: '**Your Hugging Face account email address MUST match the email |
|
you provide on the Meta website, or your request will not be approved.**' |
|
extra_gated_button_content: Submit |
|
extra_gated_fields: |
|
? I agree to share my name, email address and username with Meta and confirm that |
|
I have already been granted download access on the Meta website |
|
: checkbox |
|
pipeline_tag: text-generation |
|
inference: false |
|
arxiv: 2307.09288 |
|
--- |
|
# Neuronx model for [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) |
|
|
|
This repository contains are [**AWS Inferentia2**](https://aws.amazon.com/ec2/instance-types/inf2/) and [`neuronx`](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) compatible checkpoint for [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). You can find detailed information about the base model on its [Model Card](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). |
|
|
|
## Usage on Amazon SageMaker |
|
|
|
_coming soon_ |
|
|
|
## Usage with optimum-neuron |
|
|
|
```python |
|
|
|
from optimum.neuron import pipeline |
|
|
|
# Load pipeline from Hugging Face repository |
|
pipe = pipeline("text-generation", "aws-neuron/Llama-2-7b-chat-hf-seqlen-2048-bs-2") |
|
|
|
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating |
|
messages = [ |
|
{"role": "user", "content": "What is 2+2?"}, |
|
] |
|
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
# Run generation |
|
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
|
|
``` |
|
|
|
## Compilation Arguments |
|
|
|
**compilation arguments** |
|
|
|
```json |
|
{ |
|
"num_cores": 2, |
|
"auto_cast_type": "fp16" |
|
} |
|
``` |
|
|
|
**input_shapes** |
|
|
|
```json |
|
{ |
|
"sequence_length": 2048, |
|
"batch_size": 2 |
|
} |
|
``` |
|
|
|
|
|
|
|
|