How to Use Hugging Face Inference Endpoints with JavaScript

Community Article Published March 17, 2025

image/jpeg

In this post, I'll walk you through using Hugging Face Inference Endpoints and the @huggingface/inference JavaScript SDK with the textGeneration method.

To explore available models, check out: https://huggingface.co/tasks/text-generation. However, not all models will work—some may return an Error: Forbidden: message.

What are Inference Endpoints?

Hugging Face Inference Endpoints are API endpoints that allow you to deploy machine learning models accessible over HTTP. With fully managed infrastructure, you don't have to worry about setup or maintenance. You simply choose the model and hardware, and Hugging Face takes care of the rest, creating an API endpoint for you to use.

For example:

import 'dotenv/config';
import { InferenceClient } from '@huggingface/inference';

const client = new InferenceClient(process.env.HF_ACCESS_TOKEN).endpoint(
  'https://g7u9q9gquz2wxrjn.us-east-1.aws.endpoints.huggingface.cloud'
);

The Back story

But, before we dive in, let me explain what led me to explore Inference Endpoints in the first place.

At Neon, I’ve been focused on improving our Dev/Test workflows, specifically automating synthetic data generation to support new guides in the Neon Twin section of our docs. I previously experimented with Anthropic’s Claude Sonnet models, which I detailed in this post: Vibe Coding With AI to Generate Synthetic Data: Part 1. That experiment got me thinking—what other options are out there?

So, I tested three models to see how they handled the task. Each one produced a unique response to the same prompt, which I’ve categorized as good, bad, and ugly:

Want to try them yourself? Below, I’ll walk you through the setup steps along with a sample prompt to test each model.

Prerequisites

  1. Create a Hugging Face Account.
  2. Add credits to your account, or sign up for a Pro account.

Create an Inference Endpoint

By creating an Inference Endpoint you'll be able to test various models via HTTP requests.

Deploy Model

Visit any of the model links above, then click the Deploy button and select HR Inference Endpoints.

image/jpeg

Endpoint Configuration

On the next screen, give your endpoint a name (or leave the default), and configure the hardware. Once ready, click Create Endpoint.

image/jpeg

Endpoint URL

It might take a few minutes for your endpoint to provision, but once it's ready, you'll be able to copy the Endpoint URL, which you'll need in a later step.

image/jpeg

When initializing is complete your endpoint will be ready for use, but you'll need to create an Access Token to make requests.

Create a Hugging Face Access Token

Head over to your Hugging Face account and click your profile picture in the top right-hand corner. From the dropdown menu, select Access Tokens.

image/jpeg

On the Access Tokens page, click Create new token. On the next screen, select Read permissions, give your token a name, and then click Create token.

Make sure to note down your token, as you'll only be able to view it once. You'll need it in a later step.

image/jpeg

Create a Test Application

Run the following command to initialize a new project:

npm init -y

Install Dependencies

Install the Hugging Face Inference SDK and dotenv:

npm install @huggingface/inference dotenv

Update package.json

In your package.json, find the type key and change it from commonjs to module. This allows you to use ESM JavaScript syntax.

- "type": "commonjs",
+ "type": "module",

Create files

Create two new files at the root of your project:

  1. index.js
  2. .env

Example self-invoking function

Add the following code to index.js, making sure to replace "Add your Endpoint URL here" with your actual Endpoint URL.

import 'dotenv/config';
import { InferenceClient } from '@huggingface/inference';

const client = new InferenceClient(process.env.HF_ACCESS_TOKEN).endpoint('Add your Endpoint URL here');

(async () => {
  try {
    const { generated_text } = await client.textGeneration({
      max_tokens: 150,
      inputs: 'Explain Hugging Face Inference Endpoints.',
    });

    console.log(generated_text);
  } catch (error) {
    console.error('Error:', error);
  }
})();

Environment variables

In the .env file, add the following line and paste in the Access Token you created earlier.

HF_ACCESS_TOKEN=

Test the application

Run the following command in your terminal to test the application:

node index.js

Result

Depending on which model you used will determine the output. Below is the response from using the mistralai/Mixtral-8x7B-Instruct-v0.1 model.

"Explain Hugging Face Inference Endpoints. Hugging Face Inference Endpoints are cloud-based services offered by Hugging Face that allow users to run large-scale natural language processing (NLP) and computer vision (CV) models in a highly scalable way. The endpoints allow developers to easily deploy, config, and scale a model API, enabling access through HTTP requests from their own applications. With Hugging Face Inference Endpoints, common use cases include language translation, text classification, text generation, and image classification. The endpoints are built using Hugging Face's Transformers library, which provides access to thousands of pre-trained models, and they are designed to be easy to use, allowing users to get started quickly with minimal setup and configuration."

Finished

There are tons of models to test, but figuring out which ones will actually work before provisioning an endpoint has been a real challenge for me. It can be time-consuming and frustrating. I'm still new to this, so maybe I'm missing something, or perhaps best practices are still evolving. Either way, this post should help you get up and running.

If you're working in this space and want to connect, feel free to reach out to me on X: @PaulieScanlon.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment