Getting Started
Postgres Vector (pgvector) is a PostgreSQL extension that adds vector similarity search to your database. It lets you store and query high-dimensional embeddings — for example those produced by OpenAI, Cohere, or Sentence Transformers — directly alongside your relational data. Available as an open web service in Eyevinn Open Source Cloud, pgvector is the standard choice for AI-powered search, semantic recommendations, and retrieval-augmented generation (RAG) applications. This tutorial walks you through the steps to get started.
Prerequisites
- If you have not already done so, sign up for an Eyevinn OSC account
Step 1: Create a Postgres Vector instance
Navigate to the Postgres Vector service in the Eyevinn OSC web console. Click Create pgvector and fill in:
| Field | Description |
|---|---|
| Name | Short alphanumeric name for your instance |
| PostgresPassword | Password for the postgres superuser (required) |
| PostgresUser | Superuser username (default: postgres) |
| PostgresDb | Default database name (default: same as user) |
Click the instance card once the status turns green and running. Note the IP and port shown — you will need them to build the connection string.
Step 2: Connect to the database
Based on the IP and port, the connection URL for your database is:
postgres://<user>:<password>@<IP>:<PORT>/<db>
For example, using the defaults:
postgres://postgres:mypassword@<IP>:<PORT>/postgres
Test the connection with psql:
psql "postgres://postgres:mypassword@<IP>:<PORT>/postgres"
Step 3: Enable the pgvector extension
Once connected, enable the extension in each database where you want to use vector search:
CREATE EXTENSION IF NOT EXISTS vector;
Step 4: Create a vector column and insert embeddings
-- Create a table with a vector column (384 dimensions for e.g. all-MiniLM-L6-v2)
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(384)
);
-- Insert a document with its embedding (example with 3 dimensions)
INSERT INTO documents (content, embedding)
VALUES ('Hello world', '[0.1, 0.2, 0.3]');
Step 5: Query by similarity
-- Find the 5 most similar documents to a query vector (cosine distance)
SELECT id, content, 1 - (embedding <=> '[0.1, 0.15, 0.25]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.1, 0.15, 0.25]'
LIMIT 5;
Supported distance operators:
| Operator | Distance metric |
|---|---|
<-> |
L2 (Euclidean) distance |
<#> |
Negative inner product |
<=> |
Cosine distance |
<+> |
L1 (Manhattan) distance |
Step 6: Index for fast approximate nearest neighbor search
For large datasets, create an HNSW or IVFFlat index to speed up queries:
-- HNSW index (recommended, no training required)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- IVFFlat index (requires at least some rows to train)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Application Usage Example
import openai
import psycopg2
conn = psycopg2.connect("postgres://postgres:mypassword@<IP>:<PORT>/postgres")
cur = conn.cursor()
# Generate an embedding with OpenAI
response = openai.embeddings.create(input="What is OSC?", model="text-embedding-3-small")
embedding = response.data[0].embedding
# Query for similar documents
cur.execute(
"SELECT content FROM documents ORDER BY embedding <=> %s::vector LIMIT 5",
(str(embedding),)
)
results = cur.fetchall()
Using the CLI
osc create pgvector-pgvector myvectordb \
-o PostgresPassword="mypassword" \
-o PostgresDb="vectors"