Building a Custom Retrieval System with Motoko and Node.js
In this tutorial, we’ll walk through building a custom embedding storage and retrieval system using Motoko (a smart contract language for the Internet Computer) and Node.js (a JavaScript runtime for building server-side applications). This system can store, retrieve, and manage embeddings—numerical representations often used in machine learning or AI applications, like recommendation engines or NLP systems.
What We’ll Cover
Understanding the Problem Space
- Why embeddings are important.
- The challenges of storing embeddings efficiently.
System Design Overview
- The role of Motoko for storage.
- Node.js as a bridge to expose a REST API.
Step-by-Step Implementation
- Setting up the Motoko canister.
- Integrating Node.js with the canister.
- Building the REST API.
Enhancing and Scaling
- Security considerations.
- Potential optimizations.
1. Understanding the Problem Space
What are embeddings?
Embeddings are dense numerical representations of data that capture semantic meaning. For example:
- In NLP, embeddings represent words or sentences in a way that similar meanings are numerically closer.
- In recommendation systems, embeddings are used to compare items and users.
Challenges:
- Storage: Embeddings are often arrays of floats, requiring structured storage.
- Retrieval: Efficient querying of embeddings is crucial, especially for large datasets.
- Integration: Exposing these embeddings via a secure and accessible API.
2. System Design Overview
Architecture:
- Motoko Canister: A smart contract deployed on the Internet Computer to store embeddings persistently.
- Node.js Server: Acts as a bridge, exposing REST endpoints for users to interact with the canister.
- Frontend/Client: (Optional) Can interact with the Node.js API for UI/UX.
3. Step-by-Step Implementation
Step 1: Setting up the Motoko Canister
Install the DFINITY SDK:
sh -ci "$(curl -fsSL https://smartcontracts.org/install.sh)"
Create a new Motoko project:
dfx new embedding-store cd embedding-store
Define the
EmbeddingStore
Actor inmain.mo
:import Array "mo:base/Array"; import Time "mo:base/Time"; actor EmbeddingStore { type Embedding = { text: Text; embedding: [Float]; createdAt: Int; }; stable var embeddings: [Embedding] = []; public shared func storeEmbedding(text: Text, embedding: [Float]) : async () { let timestamp = Time.now(); embeddings := Array.append(embeddings, [{ text = text; embedding = embedding; createdAt = timestamp; }]); }; public query func getEmbeddings() : async [Embedding] { return embeddings; }; };
Deploy the Canister: Update
dfx.json
to define your canister, then deploy:dfx start --background dfx deploy
Test the Canister: Use
dfx canister call
to test methods:dfx canister call embedding-store storeEmbedding '( "Sample Text", [1.0, 0.5, 0.25] )' dfx canister call embedding-store getEmbeddings
Step 2: Setting up the Node.js Server
Initialize a Node.js Project:
mkdir embedding-api cd embedding-api npm init -y npm install express body-parser @dfinity/agent dotenv
Create the
index.js
File:const express = require('express'); const bodyParser = require('body-parser'); const { HttpAgent, Actor } = require('@dfinity/agent'); const { idlFactory } = require('./idl/embedding_store.did.js'); require('dotenv').config(); const app = express(); const port = 3000; app.use(bodyParser.json()); const canisterId = process.env.CANISTER_ID; const host = process.env.HOST; const agent = new HttpAgent({ host }); agent.fetchRootKey(); const embeddingStore = Actor.createActor(idlFactory, { agent, canisterId, }); app.post('/storeEmbedding', async (req, res) => { const { text, embedding } = req.body; try { const embeddingFloat64 = embedding.map(Number); await embeddingStore.storeEmbedding(text, embeddingFloat64); res.status(200).send('Embedding stored successfully.'); } catch (error) { res.status(500).send(`Error: ${error.message}`); } }); app.get('/getEmbeddings', async (req, res) => { try { const embeddings = await embeddingStore.getEmbeddings(); res.status(200).json(embeddings); } catch (error) { res.status(500).send(`Error: ${error.message}`); } }); app.listen(port, () => { console.log(`Server is running on http://localhost:${port}`); });
Run the Server:
node index.js
Step 3: Interacting with the API
Storing an Embedding: Use a tool like
curl
or Postman to send aPOST
request:curl -X POST http://localhost:3000/storeEmbedding \ -H "Content-Type: application/json" \ -d '{"text":"Sample Text","embedding":[0.1,0.2,0.3]}'
Retrieving Embeddings: Send a
GET
request:curl http://localhost:3000/getEmbeddings
4. Enhancing and Scaling
- Security: Use API keys, HTTPS, and rate-limiting to secure endpoints.
- Performance: Optimize storage by indexing embeddings or using vector search.
- Scaling: Split large embeddings across multiple canisters for horizontal scaling.
Closing Thoughts
By combining Motoko’s decentralized, persistent storage capabilities with Node.js’s ease of building APIs, this tutorial showcases a practical system for storing and retrieving embeddings. This setup is modular and can be enhanced with additional features like filtering, vector similarity search, or integration with frontend systems.
If you have any questions or ideas for expanding this system, feel free to reach out! Let’s build scalable, efficient solutions together. 🚀
#Motoko #NodeJS #AI #InternetComputer #Tutorial #SoftwareDevelopment