Building a Motoku LLM Retrieval System Using Internet Computer Protocol, Motoko, and Node.js

Community Article Published June 27, 2024

image/webp

The rise of the Internet Computer Protocol (ICP) has revolutionized how developers build decentralized applications. Integrating Motoko, a powerful language designed specifically for ICP, with Node.js can yield a robust and scalable Large Language Model (LLM) retrieval system. This article will guide you through building such a system, highlighting key components such as embedding storage and retrieval.

Prerequisites

Before diving into the implementation, ensure you have the following tools and knowledge:

  • Basic understanding of Motoko and Node.js.
  • Node.js and npm installed on your machine.
  • DFINITY SDK installed.
  • Basic knowledge of RESTful APIs.

Step 1: Setting Up the Motoko Canister

First, we'll create a Motoko canister to store and retrieve embeddings.

1.1 Define the EmbeddingStore Actor

Create a new file named EmbeddingStore.mo and define the EmbeddingStore actor as follows:

import Array "mo:base/Array";
import Nat "mo:base/Nat";
import Time "mo:base/Time";
import Error "mo:base/Error";

actor EmbeddingStore {
    type Embedding = {
        text: Text;
        embedding: [Float];
        createdAt: Int;
    };

    stable var embeddings: [Embedding] = [];
    stable let secretKey: Text = "8529741360"; // Replace with your actual secret key

    public shared func storeEmbedding(key: Text, text: Text, embedding: [Float]) : async () {
        if (key == secretKey) {
            let timestamp = Time.now();
            embeddings := Array.append(embeddings, [{
                text = text;
                embedding = embedding;
                createdAt = timestamp;
            }]);
        } else {
            throw Error.reject("Invalid key. Access denied.");
        }
    };

    public query func getEmbeddings() : async [Embedding] {
        return embeddings;
    };
};

1.2 Deploy the Canister

Deploy the EmbeddingStore canister to the Internet Computer:

  1. Open a terminal and navigate to your project directory.
  2. Run dfx start to start the local replica.
  3. Create a new canister by adding the EmbeddingStore configuration to your dfx.json file.
  4. Deploy the canister using dfx deploy.

Step 2: Setting Up the Node.js Server

Next, we'll set up a Node.js server to interact with the Motoko canister.

2.1 Initialize the Project

  1. Create a new directory for your Node.js project.
  2. Initialize the project by running npm init -y.
  3. Install the necessary dependencies:
npm install express body-parser @dfinity/agent dotenv

2.2 Create the Server Script

Create a new file named server.js and add the following code:

const express = require('express');
const bodyParser = require('body-parser');
const { HttpAgent, Actor } = require('@dfinity/agent');
const { idlFactory } = require('./idl/embedding_store.did.js');
require('dotenv').config();

const app = express();
const port = 3000;

app.use(bodyParser.json());

const canisterId = process.env.CANISTER_ID;
const host = process.env.HOST;

// Initialize the agent
const agent = new HttpAgent({ host });
agent.fetchRootKey(); // Necessary for local development

// Create an actor instance
const embeddingStore = Actor.createActor(idlFactory, {
    agent,
    canisterId,
});

// Helper function to convert BigInt to a string for JSON serialization
const serializeBigInt = (obj) => {
    if (typeof obj === 'bigint') {
        return obj.toString();
    } else if (Array.isArray(obj)) {
        return obj.map(serializeBigInt);
    } else if (typeof obj === 'object' && obj !== null) {
        return Object.fromEntries(
            Object.entries(obj).map(([k, v]) => [k, serializeBigInt(v)])
        );
    }
    return obj;
};

app.post('/storeEmbedding', async (req, res) => {
    const { key, text, embedding } = req.body;
    try {
        if (key !== process.env.SECRET_KEY) {
            throw new Error('Invalid key');
        }
        // Convert embedding to float64 if not already
        const embeddingFloat64 = embedding.map(Number);
        await embeddingStore.storeEmbedding(key, text, embeddingFloat64);
        res.status(200).send('Embedding stored successfully.');
    } catch (error) {
        res.status(500).send(`Error: ${error.message}`);
    }
});

app.get('/getEmbeddings', async (req, res) => {
    try {
        const embeddings = await embeddingStore.getEmbeddings();
        res.status(200).json(serializeBigInt(embeddings));
    } catch (error) {
        res.status(500).send(`Error: ${error.message}`);
    }
});

app.listen(port, () => {
    console.log(`Server is running on http://localhost:${port}`);
});

2.3 Environment Configuration

Create a .env file in your project directory and add the following environment variables:

CANISTER_ID=<your-canister-id>
HOST=http://localhost:8000
SECRET_KEY=8529741360

Replace <your-canister-id> with the actual canister ID obtained from the deployment step.

Step 3: Testing the System

With the canister deployed and the server set up, it's time to test the embedding storage and retrieval functionality.

3.1 Storing an Embedding

Use curl or a tool like Postman to store an embedding:

curl -X POST http://localhost:3000/storeEmbedding \
     -H "Content-Type: application/json" \
     -d '{"key": "8529741360", "text": "example text", "embedding": [0.1, 0.2, 0.3]}'

3.2 Retrieving Embeddings

Retrieve stored embeddings by accessing the following endpoint:

curl http://localhost:3000/getEmbeddings

Conclusion

Congratulations! You have successfully built a Motoku LLM retrieval system using Internet Computer Protocol, Motoko, and Node.js. This system allows you to store and retrieve text embeddings securely, leveraging the decentralized capabilities of the Internet Computer. As a next step, consider adding more features such as advanced search capabilities, authentication mechanisms, and integration with a frontend application.