YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Mistral on AWS Inf2 with FastAPI
Use FastAPI to quickly host serving of Mistral model on AWS Inferentia2 instance Inf2 🚀 Support Multimodal input type (input_embeds) 🖼️
Environment Setup
Follow the instructions in Neuron docs Pytorch Neuron Setup for basic environment setup.
Install Packages
Go to the virtual env and install the extra packages.
cd app
pip install -r requirements.txt
Run the App
uvicorn main:app --host 0.0.0.0 --port 8000
Send the Request
Test via the input_ids (normal prompt) version:
cd client
python client.py
Test via the input_embeds (common multimodal input, skip embedding layer) version:
cd client
python embeds_client.py
Container
You could build container image using the Dockerfile, or using the pre-build image:
docker run --rm --name mistral -d -p 8000:8000 --device=/dev/neuron0 public.ecr.aws/shtian/fastapi-mistral