DocsTagsDev

Vector Databases (RAG)

2025-12-01

Before choosing a vector database for Retrieval-Augmented Generation (RAG), it’s important to understand the broader landscape of storage options available for retrieval systems. RAG pipelines can be built on several categories of databases—each optimized for different retrieval patterns, performance models, and architectural trade-offs:

  1. Traditional Databases (SQL / NoSQL)

Traditional relational (PostgreSQL, MySQL) and document stores (MongoDB, DynamoDB) can store raw text, metadata, and sometimes pre-computed embeddings, but they are not optimized for fast vector similarity search. They work best when the retrieval logic primarily depends on structured filters, metadata, or exact matches, with vector search bolted on through extensions.

Good for: small-scale RAG, structured queries, hybrid metadata filtering Limitations: slow vector search, no native ANN indexes, poor scaling for high-dimensional embeddings

  1. Hybrid Datastores With Optional Vector Indexing

These systems are “general-purpose” databases that gained vector extensions via plugins or native features. Examples include:

  • PostgreSQL + pgvector
  • Elasticsearch / OpenSearch
  • Redis Vector Store

They provide approximate nearest neighbor (ANN) search, metadata filtering, and scalable indexing while still acting like general-purpose datastores.

Good for: mid-sized RAG, searchable logs, semantic search with metadata filters Limitations: indexing performance is slower than purpose-built vector DBs, operational overhead, not ideal for billion-scale vectors

  1. Purpose-Built Vector Databases

These databases are engineered specifically for high-performance vector similarity search using ANN algorithms like HNSW, IVF, and PQ. Examples include:

  • Pinecone
  • Weaviate
  • Milvus / Zilliz Cloud
  • Chroma
  • Qdrant

They excel at storing embeddings, performing top-k similarity search, managing high-dimensional data, and scaling horizontally. Many provide hybrid metadata filtering, re-ranking, and built-in replication.

Good for: production RAG systems, multi-million to billion vector scale, real-time indexing Limitations: cost, operational complexity (for self-hosted), vendor lock-in (for managed)

  1. Search Engines With Semantic Capabilities

Search engines built around inverted indexes (Lucene-based systems like Elastic/OpenSearch or Vespa) now support hybrid search that combines:

  • vector similarity
  • lexical scoring
  • ranking pipelines
  • BM25 + embeddings fusion

These are ideal when RAG must blend semantic search with keyword precision—for example, technical docs, legal, or code.

Good for: hybrid search, enterprise search portals, combining BM25 and embeddings Limitations: heavier configuration, indexing overhead, sometimes slower pure vector performance than dedicated vector DBs

  1. In-Memory and Lightweight Vector Stores

Useful for prototyping, local RAG, or embedded systems:

  • FAISS
  • ANNoy
  • HNSWlib

These run directly in memory and act more like libraries than full DBs.

Good for: local experiments, small RAG workloads, offline devices Limitations: no persistence, no metadata filtering, no real clustering or replication

  1. Object Storage With External Indexing

A growing pattern—store documents in S3/GCS/Azure Blob and maintain embeddings in:

  • a vector DB
  • DynamoDB + ANN extension
  • or even a custom FAISS index stored in object storage

This works well for large corpora and cheap long-term storage, with the vector index acting as the retrieval layer.

Good for: massive datasets, cloud-native architectures, cost-efficient storage Limitations: complexity in syncing raw documents with embeddings