How to Use Hugging Face Inference Endpoints with JavaScript

In this post, I'll walk you through using Hugging Face Inference Endpoints and the @huggingface/inference JavaScript SDK with the textGeneration
method.
To explore available models, check out: https://huggingface.co/tasks/text-generation. However, not all models will work—some may return an Error: Forbidden: message.
What are Inference Endpoints?
Hugging Face Inference Endpoints are API endpoints that allow you to deploy machine learning models accessible over HTTP. With fully managed infrastructure, you don't have to worry about setup or maintenance. You simply choose the model and hardware, and Hugging Face takes care of the rest, creating an API endpoint for you to use.
For example:
import 'dotenv/config';
import { InferenceClient } from '@huggingface/inference';
const client = new InferenceClient(process.env.HF_ACCESS_TOKEN).endpoint(
'https://g7u9q9gquz2wxrjn.us-east-1.aws.endpoints.huggingface.cloud'
);
The Back story
But, before we dive in, let me explain what led me to explore Inference Endpoints in the first place.
At Neon, I’ve been focused on improving our Dev/Test workflows, specifically automating synthetic data generation to support new guides in the Neon Twin section of our docs. I previously experimented with Anthropic’s Claude Sonnet models, which I detailed in this post: Vibe Coding With AI to Generate Synthetic Data: Part 1. That experiment got me thinking—what other options are out there?
So, I tested three models to see how they handled the task. Each one produced a unique response to the same prompt, which I’ve categorized as good, bad, and ugly:
- Good: mistralai/Mixtral-8x7B-Instruct-v0.1
- Bad: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Ugly: microsoft/phi-4
Want to try them yourself? Below, I’ll walk you through the setup steps along with a sample prompt to test each model.
Prerequisites
- Create a Hugging Face Account.
- Add credits to your account, or sign up for a Pro account.
Create an Inference Endpoint
By creating an Inference Endpoint you'll be able to test various models via HTTP requests.
Deploy Model
Visit any of the model links above, then click the Deploy button and select HR Inference Endpoints.
Endpoint Configuration
On the next screen, give your endpoint a name (or leave the default), and configure the hardware. Once ready, click Create Endpoint.
Endpoint URL
It might take a few minutes for your endpoint to provision, but once it's ready, you'll be able to copy the Endpoint URL, which you'll need in a later step.
When initializing is complete your endpoint will be ready for use, but you'll need to create an Access Token to make requests.
Create a Hugging Face Access Token
Head over to your Hugging Face account and click your profile picture in the top right-hand corner. From the dropdown menu, select Access Tokens.
On the Access Tokens page, click Create new token. On the next screen, select Read permissions, give your token a name, and then click Create token.
Make sure to note down your token, as you'll only be able to view it once. You'll need it in a later step.
Create a Test Application
Run the following command to initialize a new project:
npm init -y
Install Dependencies
Install the Hugging Face Inference SDK and dotenv:
npm install @huggingface/inference dotenv
Update package.json
In your package.jso
n, find the type
key and change it from commonjs
to module
. This allows you to use ESM JavaScript syntax.
- "type": "commonjs",
+ "type": "module",
Create files
Create two new files at the root of your project:
index.js
.env
Example self-invoking function
Add the following code to index.js
, making sure to replace "Add your Endpoint URL here"
with your actual Endpoint URL.
import 'dotenv/config';
import { InferenceClient } from '@huggingface/inference';
const client = new InferenceClient(process.env.HF_ACCESS_TOKEN).endpoint('Add your Endpoint URL here');
(async () => {
try {
const { generated_text } = await client.textGeneration({
max_tokens: 150,
inputs: 'Explain Hugging Face Inference Endpoints.',
});
console.log(generated_text);
} catch (error) {
console.error('Error:', error);
}
})();
Environment variables
In the .env
file, add the following line and paste in the Access Token you created earlier.
HF_ACCESS_TOKEN=
Test the application
Run the following command in your terminal to test the application:
node index.js
Result
Depending on which model you used will determine the output. Below is the response from using the mistralai/Mixtral-8x7B-Instruct-v0.1 model.
"Explain Hugging Face Inference Endpoints. Hugging Face Inference Endpoints are cloud-based services offered by Hugging Face that allow users to run large-scale natural language processing (NLP) and computer vision (CV) models in a highly scalable way. The endpoints allow developers to easily deploy, config, and scale a model API, enabling access through HTTP requests from their own applications. With Hugging Face Inference Endpoints, common use cases include language translation, text classification, text generation, and image classification. The endpoints are built using Hugging Face's Transformers library, which provides access to thousands of pre-trained models, and they are designed to be easy to use, allowing users to get started quickly with minimal setup and configuration."
Finished
There are tons of models to test, but figuring out which ones will actually work before provisioning an endpoint has been a real challenge for me. It can be time-consuming and frustrating. I'm still new to this, so maybe I'm missing something, or perhaps best practices are still evolving. Either way, this post should help you get up and running.
If you're working in this space and want to connect, feel free to reach out to me on X: @PaulieScanlon.