All posts
Vector Databases · RAG
May 18, 202610 min read

Qdrant vs Pinecone vs pgvector vs Weaviate: Choosing the Right Vector Database for Production

M

Moneeb Abbas

AI Systems Architect

Every RAG system needs a vector store, and the choice matters more than most teams realize. Pick the wrong one and you pay for it in query latency, operational complexity, missing features, or a bill that grows faster than your user base. This is the comparison I wish existed when I was making this decision across multiple production deployments.

What Actually Matters in a Vector Database

Benchmarks measuring raw ANN (approximate nearest neighbor) query speed are mostly noise for application-layer decisions. At the scale most production RAG systems operate — under 10 million vectors — every major vector database is fast enough. The meaningful differences are elsewhere:

  • Hybrid search support: Dense + sparse retrieval in a single query. Not all databases support this natively, and the ones that do not require you to merge results yourself.
  • Filtering performance: Querying 'top-k most similar vectors where metadata.client_id = X'. Naive implementations scan all vectors then filter; good implementations pre-filter or use payload indexes.
  • Operational overhead: Managed cloud vs self-hosted. Who handles upgrades, backups, and outages?
  • Cost model: Per-vector pricing, compute-based, or flat. This matters enormously at scale.
  • Data residency: Where do vectors live? Relevant for GDPR and HIPAA use cases.

Qdrant

My default recommendation for new production RAG systems. Qdrant is open-source, self-hostable, and has the best native hybrid search implementation of any database in this comparison. It supports both dense vectors and sparse vectors (BM42/BM25) in a single collection with a single query.

  • Hybrid search: Native sparse + dense support with built-in fusion. No external BM25 index needed.
  • Filtering: Payload-indexed filtering with good pre-filtering performance — does not degrade significantly on filtered queries.
  • Self-hosted: Runs in a single Docker container. Managed cloud available (Qdrant Cloud) if you prefer.
  • Cost: Free and open-source for self-hosted. Cloud pricing is per-cluster, not per-vector — predictable as you scale.
  • Limitation: Smaller ecosystem and fewer managed integrations than Pinecone.
Tip:If you need hybrid search and are open to self-hosting, Qdrant is almost certainly the right choice. The native sparse vector support eliminates the need for a separate BM25 index and the complexity of merging results from two systems.

Pinecone

The managed cloud option with the most mature ecosystem. Pinecone handles infrastructure entirely — no servers to manage, automatic scaling, and integrations with every major LLM framework. The tradeoff is cost and data residency.

  • Managed: Zero infrastructure management. Right for teams that want to move fast and not run servers.
  • Hybrid search: Supported via sparse-dense indexes (requires explicit sparse embeddings — not as seamless as Qdrant).
  • Filtering: Metadata filtering is good but can degrade on high-cardinality fields at large scale.
  • Cost: Per-write-unit and per-read-unit pricing. Becomes expensive quickly above ~10M vectors with high query rates.
  • Data residency: US and EU regions available. Not suitable for air-gapped or strict on-premises requirements.

pgvector

pgvector is a PostgreSQL extension that adds vector similarity search to your existing Postgres database. The killer advantage: if you are already running Postgres, you get vector search without adding a new infrastructure component.

  • Zero new infrastructure: Add vector search to your existing Postgres. Transactions, joins, and familiar SQL — vectors are just another column.
  • ACID compliance: Vectors stored in the same transaction as your application data. No sync lag between your relational database and a separate vector store.
  • Performance ceiling: HNSW index in pgvector is fast, but not competitive with dedicated vector databases at tens of millions of vectors under high concurrency.
  • Hybrid search: No native sparse vector support. Requires a separate full-text search mechanism (Postgres tsvector or external) and manual result merging.
  • Best fit: Under 1–2 million vectors, existing Postgres stack, no hybrid search requirement.
sql
-- pgvector usage: store and query embeddings in Postgres
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536),  -- OpenAI text-embedding-3-small
  client_id INTEGER,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Filtered similarity search
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE client_id = $2
ORDER BY embedding <=> $1
LIMIT 10;

Weaviate

Weaviate is a full-featured vector database with built-in modules for embedding generation, hybrid search, and generative search (directly querying an LLM from within Weaviate). It sits between Qdrant and Pinecone in operational complexity.

  • Hybrid search: BM25 + vector via its own fusion algorithm. Well-implemented and documented.
  • Built-in embedding modules: Can call embedding APIs directly — useful for teams that want to store raw text and let Weaviate handle vectorization.
  • GraphQL API: Different query style from the REST/SDK interfaces of Qdrant and Pinecone. Steeper learning curve.
  • Self-hosted or cloud: Both available. Self-hosted requires more configuration than Qdrant.
  • Limitation: The built-in embedding and generative modules add convenience but create tight coupling to specific providers.

The Decision Matrix

  • New RAG system, open to self-hosting, need hybrid search → Qdrant
  • Want zero infrastructure management, don't need air-gap → Pinecone
  • Already on Postgres, under 2M vectors, no hybrid search → pgvector
  • Need built-in embedding management and GraphQL interface → Weaviate
  • HIPAA or strict air-gap requirement → Qdrant self-hosted or pgvector on your own Postgres
  • Rapid prototyping with uncertain scale → Start with pgvector, migrate to Qdrant when you hit limits

One pattern I have found works well at any scale: start with pgvector to prove the use case, then migrate to Qdrant when you need hybrid search or hit Postgres performance limits. The migration is a one-time job — bulk export vectors from Postgres, bulk import into Qdrant — and the LangChain/LlamaIndex abstraction layer means your application code barely changes.

Working on something similar?

I take on 1–2 new projects per month. If you have a use case that needs this kind of engineering, tell me about it.

Get in touch