How to Fix PostgreSQL Deadlock Detected on ShareLock Transaction (With Root Cause Analysis)
Threat/Impact Level: HIGH | Downtime Risk: HIGH | Time to Fix: 15–45 mins
TL;DR
- What broke: Two concurrent transactions acquired locks in inverse order — PostgreSQL's deadlock detector killed one to break the cycle, rolling back that transaction entirely.
- How to fix it: Enforce a consistent lock acquisition order across all transactions touching the same rows; use
SELECT FOR UPDATEwith explicit ordering orSKIP LOCKEDfor queue-style workloads. - Use our Client-Side Sandbox below to paste your transaction logic and auto-refactor the lock ordering with zero data leaving your browser.
The Incident (What Does the Error Mean?)
ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 67890;
blocked by process 67890.
Process 67890 waits for ShareLock on transaction 12345;
blocked by process 12345.
HINT: See server log for query details.
CONTEXT: while updating tuple (0,42) in relation "orders"
PostgreSQL's deadlock detector runs every deadlock_timeout (default: 1 second). When it fires, it picks one transaction as the victim and issues a hard rollback. The application receives this error on the next query execution. The rolled-back transaction's work is entirely lost — your application must detect this error code (40P01) and retry, or the operation silently fails.
ShareLocks in this context are row-level locks held by in-progress transactions, not table-level shared locks. The deadlock occurs when Transaction A holds a lock on Row 1 and wants Row 2, while Transaction B holds Row 2 and wants Row 1.
The Attack Vector / Blast Radius
This is not a one-off failure. In high-concurrency environments this is a recurring production degradation pattern:
- Connection pool exhaustion: Threads waiting on locks pile up. If
deadlock_timeoutis 1s and you have 50 concurrent conflicting transactions, your connection pool saturates before the detector clears them. - Cascading retry storms: Naive retry logic without exponential backoff causes the same transactions to immediately re-conflict, worsening throughput under load.
- Silent data loss: Applications that catch the error without retrying lose writes permanently — especially dangerous in financial ledgers, inventory systems, and order management where the rolled-back transaction updated multiple tables.
- Replication lag amplification: On streaming replicas, the lock contention on primary causes WAL write spikes. Under sustained deadlock storms, replica lag can exceed your RTO.
- ORM blind spots: Hibernate, SQLAlchemy, and ActiveRecord often wrap operations in implicit transactions with non-deterministic lock ordering based on object graph traversal order — making this nearly impossible to debug without query-level logging.
How to Fix It
Basic Fix — Enforce Consistent Lock Ordering
The root cause is always lock acquisition order inversion. Fix it by sorting the rows you intend to lock before acquiring locks.
-- Transaction A and B both update accounts: sender and receiver
-- BAD: Each transaction locks in application-determined order (non-deterministic)
- BEGIN;
- UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- locks row 1
- UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- waits for row 2
- COMMIT;
-- (Concurrent Transaction B)
- BEGIN;
- UPDATE accounts SET balance = balance - 50 WHERE id = 2; -- locks row 2
- UPDATE accounts SET balance = balance + 50 WHERE id = 1; -- DEADLOCK
- COMMIT;
-- GOOD: Always lock in ascending ID order regardless of transaction direction
+ BEGIN;
+ -- Pre-sort: always lock lower ID first
+ SELECT id FROM accounts WHERE id IN (1, 2) ORDER BY id FOR UPDATE;
+ UPDATE accounts SET balance = balance - 100 WHERE id = 1;
+ UPDATE accounts SET balance = balance + 100 WHERE id = 2;
+ COMMIT;
Enterprise Best Practice — SKIP LOCKED + Advisory Locks + Retry Logic
-- BAD: Blocking SELECT FOR UPDATE with no timeout, no retry handling
- SELECT * FROM job_queue WHERE status = 'pending' FOR UPDATE;
-- GOOD: Non-blocking queue consumption with SKIP LOCKED
+ SELECT * FROM job_queue
+ WHERE status = 'pending'
+ ORDER BY created_at ASC
+ LIMIT 1
+ FOR UPDATE SKIP LOCKED;
-- GOOD: Application-level retry with 40P01 detection (Python/psycopg2 example)
+ import psycopg2
+ from psycopg2 import errors
+ import time, random
+
+ def execute_with_retry(conn, fn, max_retries=5):
+ for attempt in range(max_retries):
+ try:
+ with conn.cursor() as cur:
+ fn(cur)
+ conn.commit()
+ return
+ except errors.DeadlockDetected:
+ conn.rollback()
+ wait = (2 ** attempt) + random.uniform(0, 0.5)
+ time.sleep(wait)
+ raise Exception("Max retries exceeded on deadlock")
-- GOOD: PostgreSQL advisory locks for application-level mutex (no row lock needed)
+ SELECT pg_advisory_xact_lock(hashtext('transfer:' || LEAST(1,2)::text || ':' || GREATEST(1,2)::text));
Key postgresql.conf tuning:
- deadlock_timeout = 1s # Default — too slow for high-concurrency OLTP
+ deadlock_timeout = 100ms # Detect and resolve faster under load
+ lock_timeout = 5000 # Kill queries waiting >5s for a lock (per session or globally)
+ idle_in_transaction_session_timeout = 30000 # Kill abandoned open transactions
💡 Tired of pasting proprietary configs into ChatGPT? Generic AI tools log your company's ARNs, DB strings, and private keys. StackEngine is a zero-backend, pure Client-Side WASM utility. Drop your failing config into the sandbox above. We redact your secrets locally in the browser and auto-generate the refactored code using your own API key.
Prevention in CI/CD
1. Enable log_lock_waits in PostgreSQL — Non-Negotiable
-- postgresql.conf or per-session
log_lock_waits = on
deadlock_timeout = 100ms
This logs every lock wait exceeding deadlock_timeout to pg_log. Feed this into your observability stack (Datadog, Grafana Loki) and alert on deadlock detected log lines.
2. Query-Level Lock Analysis in Staging
-- Run during load tests to catch lock contention before production
SELECT
pid, wait_event_type, wait_event, state,
left(query, 80) AS query_snippet
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY state_change;
3. pgBadger / auto_explain in CI
- pgBadger: Parse PostgreSQL logs in your CI pipeline and fail the build if deadlock frequency exceeds threshold.
- auto_explain: Log execution plans for slow/blocked queries automatically.
- # No query plan logging
+ shared_preload_libraries = 'auto_explain'
+ auto_explain.log_min_duration = 500
+ auto_explain.log_analyze = on
4. Integration Test Lock Order Enforcement
For ORMs, write integration tests that run the same transaction concurrently using threading and assert no 40P01 errors are raised. This catches ORM-level lock order inversion before deployment.
# pytest example — run two conflicting transactions concurrently
import threading, pytest
def test_no_deadlock_on_concurrent_transfer():
errors = []
def transfer(from_id, to_id):
try:
do_transfer(from_id, to_id) # your application function
except DeadlockDetected as e:
errors.append(e)
t1 = threading.Thread(target=transfer, args=(1, 2))
t2 = threading.Thread(target=transfer, args=(2, 1))
t1.start(); t2.start()
t1.join(); t2.join()
assert len(errors) == 0, f"Deadlock detected in concurrent transfer test: {errors}"
5. Checkov / Terraform — Enforce lock_timeout at Infrastructure Level
- # RDS parameter group — no lock timeout set
- resource "aws_db_parameter_group" "pg" {
- # empty
- }
+ resource "aws_db_parameter_group" "pg" {
+ parameter {
+ name = "lock_timeout"
+ value = "5000"
+ }
+ parameter {
+ name = "deadlock_timeout"
+ value = "100"
+ }
+ parameter {
+ name = "idle_in_transaction_session_timeout"
+ value = "30000"
+ }
+ }






















