PostgreSQL Full-Text Search: Dropping Elasticsearch for 90% of Use Cases

The Practical Developer

The Libuv Thread Pool Trap: Why Node.js Async APIs Stall Under Load Postgres Covering Indexes with INCLUDE: Eliminate Heap Fetches on Read-Heavy Workloads Postgres DISTINCT ON: The Fastest Way to Get the Latest Row Per Group Postgres Transaction Isolation: The Anomalies Your App Actually Faces in Production Linux TCP Tuning for Node.js Microservices: The Kernel Settings That Stop Silent Connection Drops Under Load Postgres HOT Updates and Fillfactor: Why Not All Writes Are Created Equal Database Connection Pool Leaks: Finding the Promise That Never Returns Its Seat Linux OOM Killer in Production: Why Your Node.js Containers Die Without a Stack Trace Postgres Materialized Views: Refresh Strategies That Do Not Lock Your Dashboards API Dependency Health Checks: Why /health Is Not Enough Authorization with Zanzibar Tuples: How Google Manages Permissions and How To Build the Same Check in Node.js Postgres Advisory Locks: The 20-Character Primitive That Replaces Redis for Coordination Dead Letter Queues: The Message Queue Pattern That Saves You at 2 a.m. File Descriptor Exhaustion: The Kernel Limit That Silently Drops Node.js Connections Graceful Degradation: The Pattern That Turns Total Outages into Partial Success S3 Presigned Multipart Uploads: Stop Your API Server from Being a File Upload Bottleneck MessagePack vs JSON: The Binary Serialization Switch That Cut Our Internal RPC Overhead by 40% DNS Caching in Node.js: The Silent Cause of Production Latency Spikes Reliable Cron Jobs: The Pattern That Stops Double Runs, Missed Executions, And The 2 AM Page GraphQL Query Complexity: Stop the OOM Query Before It Reaches Your Resolver Node.js Event Loop Lag: The Hidden Metric Behind Random Latency Spikes API Request Validation with Zod: The Schema That Catches Bad Input Before It Corrupts Your Database Load Shedding in Node.js: How to Reject Traffic Before You Drown Request Hedging: Cut Tail Latency In Half Without Overprovisioning Git Bisect: The Automated Binary Search That Finds Breaking Commits in Minutes Node.js Garbage Collection Tuning: Stop Letting V8 Pause Your Event Loop Node.js Server Timeouts: The Settings That Stop Slow Clients from Holding Sockets Hostage Postgres BRIN Indexes: The Time-Series Secret That Shrinks Indexes by 99% Event Sourcing with PostgreSQL: The Pragmatic 80% Solution Node.js Cluster Mode: Scaling the Event Loop Across CPU Cores Postgres Partial Indexes: Stopping Soft Deletes from Ruining Your Query Performance Request Coalescing with the Singleflight Pattern: Stop Drowning Your Database on Every Cache Miss The Bulkhead Pattern: Why One Slow Endpoint Should Not Drown Your Whole Service Node.js AsyncLocalStorage: End-to-End Request Context Without the Propagation Hell Postgres Deadlocks: Logging the Victim, Reproducing the Race, and Fixing the Lock Order Your Node.js HTTP Client Is the Bottleneck: Connection Pool Tuning That Works Optimistic Locking in Postgres: Stop Losing Data to Race Conditions Postgres Read Replicas: Stop Serving Stale Data to Your Users Cursor Pagination: Why Offset Queries Explode at Scale and How to Fix Them Node.js Worker Threads: 60 Lines That Stop a CSV Upload from Timing Out Every Other Request Reliable Webhook Delivery: Architecture for Outbound HTTP You Can Trust Request Timeouts and Deadline Propagation: Stop the Chain of Slowness Advanced Security Practices in Node.js Graceful Shutdown in Node.js: The 40 Lines That Stop 502s During Deploys Finding Node.js Memory Leaks with Heap Snapshots Idempotency Keys in 30 Lines: Stop Your Webhook From Charging Customers Twice Backpressure In Node.js: The Fix For Slow-Motion Queue Meltdowns Retries Done Right: Jitter, Budgets, and the Stampede You Did Not See Coming The Cache Stampede: Why Your "Just Add Redis" Layer Crashes Postgres at 3 a.m. Postgres SKIP LOCKED: An 80-Line Job Queue You Can Run Without Redis Stop Doing Work Nobody Wants: AbortController in Node.js, Done Right The N+1 Query Problem: We Found 23 In One Codebase And Killed Every One I Tried 5 AI Coding Tools for a Month. Here Is What I Actually Use CI/CD From Zero to Production in 30 Minutes With GitHub Actions Node.js vs Bun vs Deno: Which Runtime Should You Pick in 2025? Kubernetes Resource Requests And Limits: The Numbers That Decide If Your Cluster Is Stable The Three Pillars of Observability Are A Myth: What Actually Matters In Production pnpm Vs npm Vs yarn Vs Bun For Monorepos: Which One Earns The Migration In 2024 JSONB Indexing In Postgres: GIN Vs Expression Indexes, And When Each Is The Right Choice A Code Review Checklist That Ends The Same Three Arguments Every Sprint gRPC Vs REST In 2024: When The Switch Pays For Itself React Suspense For Data Fetching: The Pattern That Replaces Half Your Loading State Code The Five-Stage Rollout: How To Ship A Risky Change Without Holding Your Breath GitHub Actions In A Monorepo: Caching, Path Filters, And Secret Boundaries That Actually Work The Blameless Postmortem That Actually Improves Things: A Template And Six Hard-Won Rules Recursive CTEs In Postgres: How To Query A Tree Without N Round Trips Node.js Streams: When They Actually Help, And When They Just Add Complexity Playwright Vs Cypress In 2024: The Honest Comparison Of Which One Earns The Test Time React Server Components: The Mental Model That Makes The "use client" Boundary Obvious Pod Disruption Budgets: The K8s Object That Keeps Your Service Up During Cluster Maintenance Postgres LISTEN/NOTIFY: The Pub/Sub You Already Have And Are Not Using Chaos Engineering Starter Kit: The Five Drills That Don't Need Netflix-Scale Spec-Driven API Development With OpenAPI: How To Stop Drifting From Your Docs Saga Pattern vs Two-Phase Commit: Distributed Transactions Without The Lies Kubernetes Autoscaling Beyond CPU: The Custom-Metric HPA Pattern That Actually Works Postgres Partitioning For Time-Series: The Boring Setup That Saves Your Database Distributed Locks With Redis: An Honest Look At Redlock And When You Don't Need It HTTP/2 vs HTTP/3: What Actually Changes For Your App, And What Doesn't Image Optimization For The Web In 2023: srcset, AVIF, And The Lighthouse Score You Actually Want Kafka vs RabbitMQ: A Decision Tree That Doesn't Hate You UUID vs Bigint Primary Keys In Postgres: The Index Math That Decides For You Flame Graphs: How To Find The Slow Function In 30 Seconds Without Profiling Theatre Postgres Streaming Vs. Logical Replication: Which One Solves Your Actual Problem ESLint Rules That Earn Their Keep: The Twelve I Enable On Every Project Pre-Commit Hooks That Pay For Themselves: Husky, lint-staged, And The Five Rules That Stick Zero-Downtime Database Migrations: The Six-Step Pattern That Rules Them All Circuit Breakers In Node.js: 50 Lines That Stop A Failing Dependency From Taking Down Your Service Postgres VACUUM Is Not Magic: How Your Hot Table Bloats To 80GB And How To Fix It Kubernetes Liveness And Readiness Probes: The Difference That Causes Half Your Outages Rate Limiting In Production: A Token Bucket In 30 Lines Of Redis The Outbox Pattern: How To Stop Losing Events When Postgres And Kafka Disagree Load Testing With k6: The Three Scenarios That Find Real Bugs (Not Synthetic Numbers) Postgres Row-Level Security For Multi-Tenant Apps: The Pattern That Stops You From Leaking Data Rebase vs. Merge: The Team Policy That Ends The Argument Forever OpenTelemetry in Node.js: Distributed Tracing That Actually Helps During an Incident Feature Flags That Pay Rent: The 4 Flag Types And When To Delete Each ETag, Last-Modified, and the Caching Headers Most APIs Get Wrong Connection Pooling Without the Cargo Cult: pgbouncer in 100 Lines of Config JSONB Is Not a Schema: When To Reach For It in Postgres, And When To Stop Bash Strict Mode: The Three Lines That Stop Your Deploy Script From Lying To You

The Practica · 2026-05-22 · via The Practical Developer

The product catalog has 200,000 SKUs and the search box is the highest-traffic feature on the site. Six months ago, someone decided that LIKE '%query%' was too slow and added Elasticsearch. Now you run two datastores, a sync pipeline that breaks every other deploy, and a mapping version conflict that corrupted last Tuesday’s index rebuild. The search is fast, but the operational surface area doubled for a problem Postgres already solves natively.

PostgreSQL’s full-text search is not a toy feature. It has been production-ready since 8.3, supports ranked relevance, phrase matching, highlighting, and multiple languages out of the box. For catalogs, documentation, support tickets, and internal admin search, it is often the only tool you need. This post shows the schema, indexing, and query patterns that turn a slow LIKE scan into a sub-10-millisecond ranked search, plus the exact limits where Elasticsearch still wins.

Why LIKE and ILIKE fall over

A query like SELECT * FROM products WHERE name ILIKE '%leather%' cannot use a B-tree index. Postgres must scan every row, lowercase the column, and pattern-match against the input. On 200,000 rows, that is 200,000 string comparisons. Add an OR description ILIKE '%leather%' and you are scanning the table twice. On a busy site, these queries show up in pg_stat_statements with execution times in the hundreds of milliseconds and rapidly rising buffer reads.

You can add a pg_trgm GIN index and use LIKE 'leather%' (anchored to the left), but that only helps prefix searches. Users expect word-stem matching, so “boot” matches “boots.” They expect relevance ranking, so an exact title match beats a passing mention in a description. They expect typo tolerance, which neither B-tree nor trigram indexes provide well. At some point, teams give up and reach for Elasticsearch. But Postgres has a dedicated full-text index type and query language that handles the first three of those expectations without leaving the database.

tsvector and tsquery: the core idea

Full-text search in Postgres works by converting text into a tsvector, a sorted list of lexemes (normalized word stems) with positional information. A query is converted into a tsquery, a structured search predicate. The operator @@ checks whether the tsvector matches the tsquery.

SELECT to_tsvector('english', 'The quick brown fox') @@ to_tsquery('english', 'quick & brown');
-- returns true

to_tsvector strips stop words (“the”), lowercases, and stems (“foxes” becomes “fox”). to_tsquery parses boolean operators: & for AND, | for OR, ! for NOT, and <-> for followed by. This is not string matching. It is linguistic indexing, and it is fast because the tsvector is precomputed and stored in a GIN index.

The key insight is that you do not call to_tsvector on the column at query time. You store the tsvector in a generated column, index it, and query against the index. That makes search an index-only lookup, not a table scan.

Building the search column and index

Start with a products table:

CREATE TABLE products (
  id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name text NOT NULL,
  description text,
  category text,
  search_vector tsvector GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(category, '')), 'C')
  ) STORED
);

setweight tags each source field with a priority: A (highest) for name, B for description, C for category. When ranking, a match in the name scores higher than a match in the description. The coalesce prevents NULL inputs from nullifying the whole vector.

Now add the GIN index:

CREATE INDEX idx_products_search ON products USING GIN (search_vector);

GIN (Generalized Inverted Index) is the right index type for full-text search. It stores each lexeme with a posting list of row IDs that contain it. A search for “leather” jumps directly to the posting list for the lexeme “leather,” then intersects it with other lexeme lists if the query has multiple terms. The index is larger than a B-tree (expect 30-50% of the table size), but lookups are logarithmic in the number of unique lexemes, not the number of rows.

Querying with ranking

A basic ranked search looks like this:

SELECT
  id,
  name,
  ts_rank_cd(search_vector, query, 32) AS rank
FROM products,
  plainto_tsquery('english', 'leather boots') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

plainto_tsquery converts plain user input into a tsquery, inserting & between words. ts_rank_cd computes a relevance score based on term frequency, proximity, and the weights you assigned. The optional third argument (32) tells it to divide the rank by the document length, so a short title match does not lose to a long description that mentions the term ten times.

This query uses the GIN index for the @@ filter, then sorts the results by rank. If your result set after filtering is small (under a few thousand), the sort is cheap. If it is large, add a WHERE clause on category or price to narrow the set before ranking.

Highlighting results

Users want to see why a result matched. ts_headline extracts fragments with search terms highlighted:

SELECT
  id,
  name,
  ts_headline(
    'english',
    description,
    plainto_tsquery('english', 'leather boots'),
    'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
  ) AS highlight
FROM products,
  plainto_tsquery('english', 'leather boots') query
WHERE search_vector @@ query
ORDER BY ts_rank_cd(search_vector, query) DESC
LIMIT 20;

ts_headline is powerful but not free. It re-parses the source text at query time. For high-traffic search, consider caching the highlighted snippet in application code or using it only on the top N results after ranking.

Phrase and proximity search

plainto_tsquery treats input as an AND of independent words. If the user searches for “database index,” it matches a product named “index cards for your database.” To require the words in order, use phraseto_tsquery:

SELECT * FROM products
WHERE search_vector @@ phraseto_tsquery('english', 'database index');

For custom proximity, tsquery supports the distance operator <N>:

SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'database <2> index');

This matches “database” followed by “index” within two word positions. It is useful for disambiguating compound terms without requiring an exact phrase.

Prefix matching for autocomplete

Full-text search does not do prefix matching by default. “boot” will not match “boots” because stemming normalizes both to “boot,” but “boot” will not match “bootstrap” because the stem is different. For autocomplete, combine full-text search with pg_trgm:

CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);

SELECT id, name
FROM products
WHERE name % 'boot'
ORDER BY name <-> 'boot'
LIMIT 10;

The % operator is trigram similarity, and <-> is the distance operator for ordering by closest match. This is a separate index and query path from the full-text search, but they complement each other. Use trigram for autocomplete suggestions and tsvector for the final ranked search.

Keeping the index current

Because search_vector is a generated column, it updates automatically when name, description, or category changes. There is no application-level sync to maintain. If you batch-load data with COPY, the generated column is computed during the load, which is slower than loading plain text. For large bulk imports, consider:

Dropping the GIN index before bulk load.
Loading the data.
Re-creating the index with CREATE INDEX CONCURRENTLY.

This can reduce load time by 60-80% on multi-million-row imports.

Performance and scaling limits

On a table with 500,000 products, the ranked search query above typically executes in 2-8 milliseconds with a warm cache. The GIN index size is roughly 40% of the table size. Here is what to watch:

Index bloat. GIN indexes can bloat under heavy updates because Postgres uses a pending list for fast insertions, then flushes it to the main index structure during vacuum. If autovacuum cannot keep up, searches slow down as they scan the pending list. Monitor pgstatindex('idx_products_search') and ensure pending_pages stays low. If you have a write-heavy workload, consider increasing gin_pending_list_limit or running VACUUM more aggressively.

Ranking large result sets. If a common term matches 100,000 rows, ranking all of them is expensive. Add mandatory filters (category, price range, in-stock flag) to shrink the candidate set before ts_rank_cd runs. If you cannot filter, consider using a materialized view for precomputed top-N results per category.

Multi-language content. to_tsvector('english', ...) only handles English. If you store multilingual text, use a language column and a functional index:

CREATE INDEX idx_products_search_multilang ON products USING GIN (
  to_tsvector(coalesce(language, 'english'), coalesce(name, '') || ' ' || coalesce(description, ''))
);

Query with the same language parameter used at index time.

When Elasticsearch still wins

Postgres full-text search is not a universal replacement. Reach for Elasticsearch when:

You need fuzzy matching with edit distance (“lether” matching “leather”). Postgres trigrams handle mild typos, but not Levenshtein distance at scale.
You run complex aggregations (faceted search with 20+ dimensions and counts). Postgres GROUP BY works for simple facets, but Elasticsearch aggregations are purpose-built for this.
Your search volume exceeds what a single Postgres primary can serve. A read replica helps, but Elasticsearch is designed to shard search horizontally.
You need geo-spatial search combined with text relevance. Postgres has PostGIS, but combined geo-text ranking is smoother in Elasticsearch.
Your document count is in the tens of millions and growing fast. GIN indexes are fast, but they are not distributed.

For everything else, Postgres search removes a moving part, eliminates sync pipelines, and keeps your data in one transactional system.

Migrating from LIKE queries

The safest migration path is additive:

Add the generated search_vector column and GIN index to the existing table. This is an online operation in Postgres 11+ if you use CREATE INDEX CONCURRENTLY.
Backfill the column with UPDATE products SET search_vector = ... in batches of 10,000 rows to avoid locking the table. On Postgres 12+, the generated column backfills automatically on creation.
Update application code to query search_vector when a text_search parameter is present, falling back to the old LIKE query if the parameter is absent.
Run both paths in parallel for a week, comparing result quality and latency.
Remove the old LIKE path once metrics confirm the new path is faster and produces better results.

This approach carries zero downtime and gives you an instant rollback if ranking behavior surprises your users.

Monitoring what matters

Add these checks to your observability stack:

-- Average search latency by query pattern
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
WHERE query LIKE '%search_vector%'
ORDER BY mean_exec_time DESC;

-- GIN index bloat
SELECT pg_size_pretty(pg_relation_size('idx_products_search')) AS index_size,
       pgstatindex('idx_products_search')->>'avg_leaf_density' AS leaf_density,
       pgstatindex('idx_products_search')->>'leaf_fragmentation' AS fragmentation;

Alert if average search latency crosses 50 ms or if GIN fragmentation exceeds 30%. Both indicate that vacuum is falling behind or the query pattern needs stricter filtering.

The takeaway

Elasticsearch is a powerful search engine, but it is also a second database with its own replication, monitoring, backup, and schema migration story. For product catalogs, documentation search, support ticket lookup, and most internal admin tools, PostgreSQL full-text search is fast enough, feature-rich enough, and operationally simpler by an order of magnitude.

The migration is not a rewrite. It is one generated column, one GIN index, and a query that uses ts_rank_cd instead of LIKE. Start there. Measure latency, relevance, and index size. If you hit the scaling limits (fuzzy matching, complex facets, or horizontal sharding), then you will have concrete evidence that Elasticsearch is justified. Until then, you are probably operating two databases because no one checked whether the first one could already do the job.

A note from Yojji

Simplifying infrastructure by using the tools you already own is one of the fastest ways to reduce operational risk. Yojji’s engineering teams regularly help clients evaluate whether their current stack can handle new requirements before adding complexity.

Yojji is an international custom software development company founded in 2016, with offices in Europe, the US, and the UK. Their 50+ senior engineers specialize in Postgres, Node.js, and cloud-native architecture, building systems that stay maintainable as they scale.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

The Practical Developer

Why LIKE and ILIKE fall over

tsvector and tsquery: the core idea

Building the search column and index

Querying with ranking

Highlighting results

Phrase and proximity search

Prefix matching for autocomplete

Keeping the index current

Performance and scaling limits

When Elasticsearch still wins

Migrating from LIKE queries

Monitoring what matters

The takeaway

A note from Yojji