Skip to main content


Here's how to use EnergeticAI's pre-trained embeddings model.


Need an introduction to embeddings? Check out this overview.

About the model

EnergeticAI uses the lightweight, English-only version of the Universal Sentence Encoder from Google. This model is trained on a variety of data sources, and is designed to be a general-purpose model that can be used for a variety of tasks.

Given a sentence or short paragraph in English, the model will return a 512-dimensional vector that represents the meaning of the text.

Creating embeddings from text

You can install the embeddings package using npm:

npm install --save @energetic-ai/core @energetic-ai/embeddings

The embeddings package can compute embeddings for a single string or multiple strings at once. If you pass in an array of strings, you'll get an array of embeddings back. If you pass a single string, you'll get a single embedding back.

import { initModel } from "@energetic-ai/embeddings";

(async () => {
const model = await initModel();

// You can also embed a single string
const embedding = await model.embed(
"Embeddings are a powerful machine learning tool"

// Embed multiple strings at once
const [healthy, delicious] = await model.embed([
"Fruit is healthy",
"Fruit is delicious",

Improving cold-start performance

The first time you call initModel(), it will download the model weights from the internet. This can take a few seconds, but you can speed it up by installing the English language model weights:

npm install --save @energetic-ai/model-embeddings-en

Then, you can pass the model weights directly into initModel():

import { initModel } from "@energetic-ai/embeddings";
import { modelSource } from "@energetic-ai/model-embeddings-en";

(async () => {
const model = await initModel(modelSource);
// ... snip ...

Comparing embeddings

As we alluded to above, there's a convenient distance() function that computes the cosine similarity between two embeddings. The result is a number between 0 and 1, where 0 means the embeddings are completely different, and 1 means the embeddings are identical.

import { initModel, distance } from "@energetic-ai/embeddings";

(async () => {
const model = await initModel();
const [healthy, delicious, embeddings] = await model.embed([
"Fruit is healthy",
"Fruit is delicious",
"Embeddings are a powerful machine learning tool",

console.log(distance(healthy, delicious)); // 0.89 (high similarity)
console.log(distance(healthy, embeddings)); // 0.24 (low similarity)

If you're building something simple, it's worth starting with this function before you try to build something more complex (e.g. using a vector database). It's pretty fast, and it's often good enough.

Storing embeddings

If you're trying to make comparisons with millions of items or more, it's worth considering a vector database such as Postgres, Redis, or Milvus, which can perform these comparisons efficiently.

Open-source vector databases

General purpose databases with vector support:

  • Postgres: You can store and index vectors in Postgres using the pgvector extension. A number of managed Postgres providers including Supabase, Neon, and support this extension. Some of the managed Postgres offerings from larger cloud providers also support pgvector as well, including Amazon's RDS.
  • Redis: You can store an index vectors in Redis using the RediSearch module.
  • SQLite: You can store and index vectors in SQLite using the sqlite-vss extension, which leverages Meta's Faiss library under the hood.

Dedicated vector databases:

  • Chroma: The open-source vector database called Chroma is designed for AI use-cases and has an official JavaScript client.
  • Milvus: The dedicated open-source vector database Milvus comes with a managed cloud offering and has an official JavaScript client.
  • Weaviate: The open-source vector database Weaviate has an official TypeScript client.

Proprietary vector databases

  • Pinecone: The proprietary database Pinecone is a managed vector database that supports text embeddings out of the box, and has official JavaScript bindings.



The model is currently English-only. Please chime in on the GitHub issue if you'd like to see support for one of the pre-trained multilingual models.

Handling longer text

This embedding model performs best on sentences and short paragraphs. If you have longer text, consider:

  • Splitting the text: If you have a long document, you can split it into sentences or paragraphs and embed each one separately.
  • Truncating the text: If you have a long document, you can truncate it to a specific section that represents the meaning well (e.g. an abstract section).
  • Averaging the embeddings: If you have a long document, you can embed each sentence or paragraph separately, then average the embeddings together to find the collective meaning.