Loading...
Loading...
"Just use Postgres." The vector extension that lets one database do transactional data plus semantic search.
Fully managed, scale-to-zero, pay-per-request. The "I never want to operate a vector DB" option.
Open source, self-hostable, fast. The sweet spot between "just Postgres" and "managed-only".
Pick pgvector when you are already on Postgres and your corpus is under ~10M vectors. Semantic search inside your transactional DB, ACID guarantees, JOINs with relational data.
Pick Pinecone for zero operational surface at scale. Fully managed, serverless pricing, billions of vectors with no sharding to think about. Best when RAG is the product and ops budget is thin.
Pick Qdrant for an open-source, self-hostable vector DB that scales past pgvector. Pinecone-class capability at a fraction of the per-vector cost. Qdrant Cloud is there if you change your mind.
Use pgvector as your primary index for small-to-medium workloads and promote to Pinecone or Qdrant when a specific collection outgrows it. Many teams run pgvector for most collections and one dedicated vector DB for the one huge index - not one-size-fits-all.
Three vector backends, same 768-dim embeddings with HNSW. pgvector is competitive through ~10M vectors, then its curve steepens. Pinecone and Qdrant stay flat into the 100M+ range. The shaded band is the decision region - below it pgvector is fine, above it a dedicated vector DB wins.
Illustrative latency curves for 768-dim HNSW at p95, drawn from ANN-Benchmarks, Qdrant\'s published numbers, and community benchmarks of pgvector 0.8. Real numbers shift with filter selectivity, quantization, and hardware. The shape - pgvector\'s curve steepens past ~10M, Pinecone and Qdrant stay flat - is stable across setups.
A 6-step mental model for picking the right vector backend based on your corpus size, your ops capacity, and what you are actually building.
If you already run Postgres and your corpus is under ~10M vectors, pgvector is probably enough. The default answer in 2026 should be "try pgvector first" - then measure. Many teams end up never needing a dedicated vector DB.
Why this is not a win: pgvector adds zero services - it lives in your existing Postgres. Pinecone is 100% managed, zero self-host option. Qdrant lets you choose. Which is "best" depends on who is on-call.
Why it matters: Pinecone was designed for billion-scale. Qdrant handles hundreds of millions well. pgvector scales further than people think (100M+ with HNSW tuning) but requires care.
Why it matters: All three use HNSW as the workhorse. Qdrant offers the most tuning knobs (quantization, payload on disk, on-disk HNSW) exposed through a clean API. Pinecone hides the algorithm; pgvector exposes ef_construction and m.
Why this is not a win: pgvector wins expressiveness (any SQL), but the optimizer may not always combine vector search + filter efficiently. Qdrant's payload filtering is fast and well-integrated. Pinecone supports common filters but is less flexible than SQL.
Why it matters: Qdrant leads on native hybrid search including BM25-style sparse indexes. Pinecone supports hybrid with their sparse-dense index. pgvector requires combining pg_trgm or a full-text search with vector queries manually.
Why it matters: Qdrant and Pinecone both handle 100M+ vectors with single-digit to tens-of-millisecond latencies. pgvector is competitive at smaller scale but struggles to match them past ~10M without serious hardware and tuning.
Why it matters: pgvector is typically the cheapest because it rides on Postgres you already pay for. Pinecone is usage-priced and can scale up fast. Qdrant self-hosted is cheap per-vector; Qdrant Cloud sits in between.
Why this is not a win: pgvector and Pinecone both land at "zero new ops" for different reasons. Qdrant self-hosted requires operational attention; Qdrant Cloud does not.
Why it matters: Pinecone namespaces are purpose-built for multi-tenant RAG (one index, many namespaces, fast filtering). Qdrant collections approach the same need. pgvector's multi-tenancy is SQL-style, flexible but less purpose-built.
Illustrative latency shapes on 768-dim OpenAI-style embeddings with HNSW indexes (pgvector defaults, Pinecone serverless, Qdrant 1.x). Numbers shift with dimensionality, filter selectivity, and hardware. Qualitative shape is stable across community benchmarks.
| Operation | Dataset | pgvector | Pinecone | Qdrant | Delta |
|---|---|---|---|---|---|
| p95 query latency at 1M vectors | 768-dim, HNSW, top-10 | ~8 ms | ~15 ms | ~6 ms | - |
| p95 query latency at 100M vectors | 768-dim, HNSW, top-10 | ~150 ms (careful tuning) | ~25 ms | ~20 ms | ~6-7x over pgvector |
| Insert throughput | bulk upsert 1M vectors | ~5k vec/sec | ~30k vec/sec | ~40k vec/sec | - |
| Cost per 10M vectors (storage) | 768-dim, HNSW index | ~$30/mo (Postgres disk) | ~$75/mo (serverless storage) | ~$40/mo (self-host) / ~$60 (Cloud) | - |
| Filter + vector search (moderate selectivity) | 10% of rows match filter | ~40 ms (post-filter) | ~20 ms (native filter) | ~12 ms (payload filter) | - |
Below is a minimal "store a document, query by embedding" in each system's native form. The API surfaces differ more than the concepts. All three support HNSW, cosine / L2 / dot product, top-k retrieval, and metadata filtering - they just expose it differently.
# pgvector - just Postgres + SQL
import psycopg
from openai import OpenAI
client = OpenAI()
conn = psycopg.connect("postgresql://...")
# One-time setup:
# CREATE EXTENSION vector;
# CREATE TABLE docs (
# id bigserial PRIMARY KEY,
# content text, embedding vector(1536));
# CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops);
def upsert(id, content):
emb = client.embeddings.create(
model="text-embedding-3-large", input=content
).data[0].embedding
conn.execute(
"INSERT INTO docs (id, content, embedding) VALUES (%s, %s, %s)",
(id, content, emb),
)
def query(text, k=5):
qemb = client.embeddings.create(
model="text-embedding-3-large", input=text
).data[0].embedding
rows = conn.execute(
"SELECT content FROM docs ORDER BY embedding <=> %s LIMIT %s",
(qemb, k),
).fetchall()
return [r[0] for r in rows]# Pinecone - managed serverless
from pinecone import Pinecone
from openai import OpenAI
client = OpenAI()
pc = Pinecone(api_key="...")
index = pc.Index("docs") # created in the console or via API
def upsert(id, content):
emb = client.embeddings.create(
model="text-embedding-3-large", input=content
).data[0].embedding
index.upsert(vectors=[
{"id": str(id), "values": emb, "metadata": {"content": content}},
])
def query(text, k=5):
qemb = client.embeddings.create(
model="text-embedding-3-large", input=text
).data[0].embedding
res = index.query(vector=qemb, top_k=k, include_metadata=True)
return [m.metadata["content"] for m in res.matches]# Qdrant - open source, self-host or Qdrant Cloud
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Distance, VectorParams
from openai import OpenAI
client = OpenAI()
qdrant = QdrantClient("localhost:6333")
# One-time setup:
# qdrant.create_collection(
# collection_name="docs",
# vectors_config=VectorParams(size=1536, distance=Distance.COSINE))
def upsert(id, content):
emb = client.embeddings.create(
model="text-embedding-3-large", input=content
).data[0].embedding
qdrant.upsert(
collection_name="docs",
points=[PointStruct(id=id, vector=emb, payload={"content": content})],
)
def query(text, k=5):
qemb = client.embeddings.create(
model="text-embedding-3-large", input=text
).data[0].embedding
# qdrant-client 1.10+ uses query_points; .search() was removed.
res = qdrant.query_points(
collection_name="docs", query=qemb, limit=k,
)
return [p.payload["content"] for p in res.points]Note: All three are within 5-10 lines of each other for the basic case. pgvector wins on "no new infrastructure." Pinecone wins on "zero ops forever." Qdrant wins on "same capability class as Pinecone, self-hosted or cloud."
# pgvector - SQL WHERE plus vector similarity
def query_filtered(text, user_id, k=5):
qemb = embed(text)
rows = conn.execute(
"""SELECT content FROM docs
WHERE user_id = %s AND tags @> ARRAY[%s]::text[]
ORDER BY embedding <=> %s LIMIT %s""",
(user_id, "public", qemb, k),
).fetchall()
return [r[0] for r in rows]
# SQL WHERE is infinitely expressive.
# The planner will fetch-and-filter, which can be slow if the filter
# is very restrictive. Partial indexes help. HNSW + filter is a
# known-tricky combination for the optimizer.# Pinecone - filter expression in query
def query_filtered(text, user_id, k=5):
qemb = embed(text)
res = index.query(
vector=qemb, top_k=k,
filter={"user_id": {"$eq": user_id}, "tags": {"$in": ["public"]}},
include_metadata=True,
)
return [m.metadata["content"] for m in res.matches]
# Filters are evaluated at search time inside the index.
# Pinecone namespaces can also partition by tenant for
# multi-tenant scale.# Qdrant - rich payload filter DSL
from qdrant_client.models import Filter, FieldCondition, MatchValue, MatchAny
def query_filtered(text, user_id, k=5):
qemb = embed(text)
# qdrant-client 1.10+ uses query_points; .search() was removed.
res = qdrant.query_points(
collection_name="docs",
query=qemb,
query_filter=Filter(must=[
FieldCondition(key="user_id", match=MatchValue(value=user_id)),
FieldCondition(key="tags", match=MatchAny(any=["public"])),
]),
limit=k,
)
return [p.payload["content"] for p in res.points]
# Filtering is fast and interleaves with HNSW traversal.
# Payload indexes can be created for high-selectivity filters.Note: Metadata filtering is where pgvector's SQL expressiveness competes with dedicated vector DBs' filter integration. If your filter is highly selective, Qdrant's and Pinecone's engines handle it more efficiently than pgvector's post-filter approach.
For most teams in 2026, pgvector is enough. If you already run Postgres and your corpus is under ~10M vectors, adding a dedicated vector DB is usually premature. Start with pgvector, measure query latency and insert throughput against your SLOs, and promote to Pinecone or Qdrant only when you hit real limits. The "we switched from Pinecone to pgvector" blog posts of 2023-2025 were not wrong.
Comfortably 1-10 million vectors with HNSW indexes on modest Postgres hardware. With careful tuning (ef_construction, m, shared_buffers, enough RAM to fit the index) pgvector can handle 100M vectors, but latencies grow past 10M and insert throughput lags dedicated vector DBs. Past 100M vectors, Pinecone or Qdrant is the right answer.
At small scale (1M vectors), all three are within 5-15ms p95 on typical queries. At 100M+, Qdrant and Pinecone are 5-10x faster than pgvector because they were designed for that scale. Qdrant tends to edge Pinecone on pure query latency; Pinecone tends to edge Qdrant on zero-ops scale-to-billions.
No. Pinecone is a proprietary managed SaaS. There is no open-source Pinecone and no self-host option. If open source or self-host matters to your organization, Qdrant (Apache 2.0) or pgvector (PostgreSQL License) are the choices.
Yes, but you wire it yourself. Use Postgres full-text search or pg_trgm for the sparse / keyword side and pgvector for dense, then combine results in your application or a SQL UNION. Qdrant supports native hybrid search with sparse + dense + fusion in a single query, which is simpler and often faster if hybrid is on your hot path.
Functionally close, but not a drop-in. The APIs differ - Pinecone uses index/upsert/query; Qdrant uses collection/points/search. The concepts map cleanly, but your integration code will need changes. Qdrant Cloud offers Pinecone-like managed hosting if you want the managed story without Pinecone's pricing.
pgvector: roughly your existing Postgres cost plus index disk (~$30/month on a 2-core 16GB instance with HNSW). Pinecone Serverless: roughly $75/month for 10M vectors including moderate read/write. Qdrant self-hosted: ~$40/month on a small VM with the index in RAM. Qdrant Cloud: ~$60/month at similar scale. pgvector is typically the cheapest if you already pay for Postgres.
No. All three use CPU-based HNSW by default in 2026. Qdrant has GPU-accelerated search as an optional feature for very large collections where latency is critical. Pinecone uses hardware acceleration internally without exposing it. pgvector is purely CPU. For most workloads, CPU HNSW is fast enough.