Serving private and gated models
If the model you wish to serve is behind gated access or resides in a private model repository on Hugging Face Hub, you will need to have access to the model to serve it.
Once you have confirmed that you have access to the model:
- Navigate to your account’s Profile | Settings | Access Tokens page.
- Generate and copy a read token.
If you’re the CLI, set the HF_API_TOKEN
environment variable. For example:
export HF_API_TOKEN=<YOUR READ TOKEN>
Alternatively, you can provide the token when deploying the model with Docker:
model=<your private model> volume=$PWD/data token=<your cli Hugging Face Hub token> docker run --gpus all -e HF_API_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model