Bajándole todos los minutos posibles al CI del backend con mas de 1000 tests

DEV Community

Authentication Security Deep Dive: From Brute Force to Salted Hashing (With Java Examples) Why AI Systems Don’t Fail — They Drift Spilling beans for how i learn for exam😁"Reinforcement Learning Cheat Sheet" I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked) How Python Borrows Other People's Work The $40 Architecture: Processing 1 Billion API Requests with 99.99% Uptime Vibe Coding: A Workflow Guide (From Zero to SaaS) Most webhook security guides protect the wrong side. The scary part is delivery. Headless CMS for TanStack Start: Build a Blog with Cosmic EU Age Verification App "Hacked in 2 Minutes" — What Actually Happened Comfy Cloud’s delete function does not actually remove files Running AI Models on GPU Cloud Servers: A Beginner Guide Event-driven media intelligence with AWS Step Functions and Bedrock I scored 500 AI prompts across 8 quality dimensions — here's what broke How to Call Google Gemini API from Next.js (Free Tier, No Backend Needed) The Portal Protocol: Reclaiming Human Connection in the Age of AI How to Fix Your Team's Scattered Knowledge Problem With a Self-Hosted Forum Intro to tc Cloud Functors: A Graph-First Mental Model for the Modern Cloud Designing Multi-Tenant Backends With Both Ownership and Team Access I Built a Neumorphic CSS Library with 77+ Components — Here's What I Learned PostgreSQL Performance Optimization: Why Connection Pooling Is Critical at Scale Cómo construí un SaaS multi-rubro para gestionar expensas en Argentina con FastAPI + Vue 3 🚀 I Built an Ethical Hacking Scanner Tool – Open Source Project I Replaced /usage and /context in Claude Code With a Single Statusline A Pythonic Way to Handle Emails (IMAP/SMTP) with Auto-Discovery and AI-Ready Design I Collected 8.9 Million Polymarket Price Points — Here's What I Found About How Markets Really Move EcoTrack AI — Carbon Footprint Tracker & Dashboard Everyone's Using AI. No One Agrees How. 5 self-hosted ebook managers worth trying in 2026 Building Your First AI Agent with LangChain: From Chatbot to Autonomous Assistant Common SOC 2 Failures (Real World) Stop Vibe-Checking Your AI App: A Practical Guide to Evals How to Use SonarQube and SonarScanner Locally to Level Up Your Code Quality Your Next To-Do App Is Dead — I Replaced Mine with an OpenClaw AI Sign a Nostr event in 60 lines of Python using coincurve — no nostr-sdk, no nbxplorer, no rust toolchain ITGC Audit Explained Like You’re in Big 4 Patch Tuesday abril 2026: Microsoft parcha 163 vulnerabilidades y un zero-day en SharePoint Stop scraping everything: a better way to track competitor price changes Listing on MCPize + the Official MCP Registry while routing payments OUTSIDE the marketplace — how I kept 100% of my x402 revenue Building an AI-Powered Risk Intelligence System Using Serverless Architecture Why We Ripped Function Overloading Out of Our AI Toolchain Testing AI-Generated Code: How to Actually Know If It Works SaaS Churn Is Killing Your Business. Here Is What to Do About It (Without a Support Team) The Speed of AI Is No Longer Linear - And Self-Improving Models Are Why How to Implement RBAC for MCP Tools: A Practical Guide for Engineering Teams From Standard Quote to Persuasive Proposal: AI Automation for Arborists I built a CLI that scaffolds complete multi-tenant SaaS apps Axios CVE-2025–62718: The Silent SSRF Bug That Could Be Hiding in Your Node.js App Right Now The dashboard that ended our friendship Data Pipelines Explained Simply (and How to Build Them with Python) The Hidden Cost of AI Systems Nobody Talks About. undefined vs undeclared, and how typeof behaves Switching from file-based jobs to NATS/Kafka in Rust without changing code io_uring Adventures: Rust Servers That Love Syscalls Why Agentic AI is Killing the Traditional Database The POUR principles of web accessibility for developers and designers Quantum Neural Network 3D — A Deep Dive into Interactive WebGL Visualization How To Install Caveman In Codex On macOS And Windows Automation Pipeline Reliability: Why Your Workflow Breaks When Nobody Is Watching I Built an 'Open World' AI Coding Agent — It Works From ANY Folder From Freelancing to Product: A Tech Service Company's SaaS Transformation China's AI Giants: Adding Tencent Hunyuan & ByteDance Doubao to AI University (74 Providers) On the Vibe Coders and Their Lies clerk: Auto-Summarize Your Claude Code Sessions AI Weekly — 2026/04/10–04/17 | The Model Lockdown Is Here, but the Toolchain Is the Real Battleground AI 週報 — 2026/04/10–2026/04/17 模型封鎖潮來了，但工具鏈才是真戰場 Maybe this is how Open-Source apps are born... 🚀 Fine-Tune LLMs with LoRA and QLoRA: 2026 Guide tRPC v11 + Next.js App Router: End-to-End Type Safety Without the Boilerplate ShadCN UI in 2026: Why I Stopped Installing Component Libraries and Started Owning My Components SaaS Billing in React Server Components: Stripe + Supabase Without a Single `useEffect` Join our DEV Weekend Challenge — $1,000 in Prizes Across TEN winners! Submissions Due April 20 at 6:59 AM UTC. Implementing FSRS Spaced Repetition in Flutter + Supabase — Adding Memory Science to an AI Learning App "I Texted My Localhost From the Train — Claude Code Fixed the Bug Before I Got Home" I Built a Sales Prep AI and It Went Deeper Than Expected Design to Code #2: One JSON, Eleven Outputs Solving the 100M-Row Problem: A Summary Table Pattern for High-Volume Push Notification Logs Flutter Web With Wasm: What Actually Changes For Developers I Built 50 Royalty-Free Soundtracks for My Side Project in a Weekend Using AI Music Generation The Vibe Coding Security Checklist: 7 Things to Check Before You Ship Stop Letting Googlebot Guess Fix Your React App's SEO Right Desconstruindo o Streaming do LinkedIn: Como Criar um Engine de Extração de Vídeo de Alta Performance com HLS e FFmpeg (EDA Part-1) EDA (Exploratory Data Analysis) Explained With Real Life — Why Looking at Your Data Is the Most Important Step in Machine Learning Brand Relationship Management at Scale: Our 4-Touch Outreach System for 200+ Brands Why String.fromEnvironment() Might Return an Empty String in Dart JGuardrails 1.0.0 — Hardening Java LLM Apps Against Jailbreaks, Toxicity, and Prompt Injection Plan and Schedule a Full Week of Threads Content From One Claude Conversation Coding Cat Oran Ep3, Five Tables Changed Everything Updated: BFF Pattern I'm done watching freelancers get buried by 200 proposals. So I'm building the alternative. This is my first post BFS Algorithm in Java Step by Step Tutorial with Examples Tracking LLM Pricing Monthly: An Open Dataset for 22 AI Models How We Measure Content ROI on a Comparison Site: Revenue Attribution Without Perfect Data Introducing Nova AI Ops: The AI-Native Operating System for SRE Teams I built a free desktop video downloader for Windows — Grabbit How Talkie OCR Helps Vision-Impaired & Dyslexic Users Read the World Around Them VRCFaceTracking安装和iPhone面捕配置教程，有bug Even CrowdStrike Can't See Your Agents The Automation Gold Rush: What n8n Workflows and Claude Are Opening Up for Developers Right Now

Franchesco Romero · 2026-05-26 · via DEV Community

Esta es la historia de muchas horas de trabajo en un hoyo continuo tratando de elegir optimizaciones entre package managers, caches, paralelismo, optimizaciones de PostgreSQL advisory locks, y el golpe de realidad de darme cuenta que el cuello de botella no era nada de lo que había estado optimizando, si no algo que no había analizado.

Antes de empezar aca esta el código disponible en Github para seguir paso a paso:

Code companion — backend CI optimisation post

Stand-alone, copy-pasteable files for every code block in docs/blog-backend-ci-optimization.md.

Each folder maps to one round of the blog narrative:

00-starting-point/   the test job, Dockerfile, conftest before any work
01-uv/               drop pip for uv (Dockerfile + CI snippet)
02-buildkit/         BuildKit cache mounts + setup-buildx + --push
03-xdist-traps/      per-worker DB + DATABASE_URL alignment
04-template-db/      Postgres template DB + pg_advisory_lock
                     (plus the filelock dead-end as documentation)
05-diagnostic/       the measurement step that broke the assumption
06-final/            final shape — 4 matrix shards, no xdist, cumulative
                     Dockerfile + complete conftest

Notes

The xdist code (03, 04) is still present in 06-final/conftest.py for local pytest -n N runs even though the CI dropped xdist in favour of serial matrix shards.
04-template-db/filelock-attempt-DEAD-END.py is kept around…

El punto de partida

El codebase:

un backend FastAPI (~50 routers, ~45 modelos SQLAlchemy)
1826 tests, desplegado vía GitHub Actions a AWS ECS Fargate en arm64.
El CI corre en un self-hosted spot runner (4X concurrencia, en Graviton).

En un inicio comenzamos con un paralelismo tradicional, nada del otro mundo, solo dos shards vía pytest-split.

# .github/workflows/deploy-backend.yml — inicial
- run: cd backend && pip install --cache-dir "$RUNNER_TEMP/pip-cache" -r requirements-dev.txt

- name: Tests (shard ${{ matrix.shard }}/2)
  run: |
    cd backend
    pytest tests/ \
      --splits 2 --group ${{ matrix.shard }} \
      --cov=app --cov-report= \
      -v

docker build estándar. Y usando pip para todo.

# backend/Dockerfile — inicial
FROM python:3.11-slim AS base
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq-dev gcc curl \
    && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

Intento no. 1: instalar dependencias más rápido con `uv`

uv es el reemplazo más sencillo que existe de pip creado por Astral, normalmente de 10x hasta 100× más rápido para resolver e instalar paquetes. La migración es bastante sencilla ya que son dos
líneas en el Dockerfile y listo:

# Pin vía la imagen multi-stage oficial para que el binario sea reproducible
COPY --from=ghcr.io/astral-sh/uv:0.5.11 /uv /uvx /usr/local/bin/
RUN uv pip install --system --no-cache-dir -r requirements.txt

requirements.txt y requirements-dev.txt quedan exactamente iguales, uv pip los lee nativo. No hay que migrar forzosamente a pyproject.toml.

Y en CI, le implementamos la action correspondiente:

- uses: astral-sh/setup-uv@v6
  with:
    version: "0.5.11"
    enable-cache: true
    cache-suffix: "shard-${{ matrix.shard }}"
- run: cd backend && uv pip install --system -r requirements-dev.txt

En este caso el cache-suffix por shard replica el workaround del --cache-dir por shard que teníamos con pip — sin él, los dos shards que caen en el mismo runner self-hosted se pelean por el mismo tarball y uno de los dos se muere con tar exit code 2.

La comparativa, en números:

$ time uv pip install -r requirements.txt
...
uv pip install -r requirements.txt  1.23s user 1.59s system 51% cpu 5.509 total

5.5s contra ~60s con pip y casi 80s en el runner.

Intento no. 2: BuildKit cache mounts en el Dockerfile

El layer cache de Docker solo ayuda cuando el COPY no invalida las
layers de abajo. Cualquier cambio en requirements.txt re construye
todo lo que sigue. Los BuildKit cache mounts ayudan a persistir el contenido entre builds sin importar la invalidación de layers:

# backend/Dockerfile — con cache mounts
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt,sharing=locked \
    apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
        libpq-dev gcc curl

COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/uv \
    uv pip install --system -r requirements.txt

Algo importante a mencionar:

Quitar el rm -rf /var/lib/apt/lists. Ya que servía para reducir el tamaño de la imagen, pero con cache mounts BuildKit es dueño de esos paths y los limpia entre builds sin hacer nada.
sharing=locked serializa lecturas concurrentes. Sin eso, dos builds en paralelo en el mismo runner pueden corromper el caché.

El runner que en este caso es self-hosted trae el builder legacy de Docker por default, y en el primer push después del commit de cache mounts se rompió con:

the --mount option requires BuildKit. Refer to
https://docs.docker.com/go/buildkit/ to learn how to build images with
BuildKit enabled

Lo arreglamos, de lo más fácil: instalar buildx y alias de docker build a docker buildx build:

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3
  with:
    install: true

El install: true es la opción necesaria. Sin eso, docker build sigue usando el builder legacy.

Buildx no carga al daemon local por default, así que el step de
docker push que ya teníamos también dejó de funcionar. Cambiamos a
--push directo:

- run: |
    cd backend
    docker build \
      --cache-from $IMAGE:latest \
      --cache-to type=inline \
      -t $IMAGE:sha-$TAG \
      -t $IMAGE:latest \
      --push \
      .

El --cache-to type=inline mete la metadata del layer cache dentro
de la imagen, cojn estoel --cache-from del siguiente build lo jala de regreso el ECR.

El primer build todavía viene con el costo de instalación de las dependencias y paquetes; los builds siguientes con los mismos requirements se lo brincan. Pasando de ~40s a ~10s en cache hits.

Intento no. 3: la trampa de `pytest-xdist`

El instinto es pensar que entre mas shards, podemos ahorrarnos mas tiempo, es decir 2 shards × 4 workers = 8 procesos en paralelo.

# backend/requirements-dev.txt
pytest-xdist==3.6.1

# .github/workflows/deploy-backend.yml
pytest tests/ \
  --splits 2 --group ${{ matrix.shard }} \
  -n 4 --dist worksteal \
  --cov=app --cov-report= \
  -v

Pero, es aquí es donde empieza el dilema.

Trampa 1: `DROP SCHEMA` entre workers

En mi configuración el conftest remueve y recrea el schema al inicio de cada session:

@pytest_asyncio.fixture(scope="session", autouse=True)
async def setup_db():
    async with test_engine.begin() as conn:
        await conn.execute(sa.text("DROP SCHEMA public CASCADE"))
        await conn.execute(sa.text("CREATE SCHEMA public"))
    async with test_engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

scope="session" significa una vez por session de pytest. Con
pytest-xdist, cada worker es su propia session.

Los cuatro workers apuntando a la misma DB myapp_test se la pasan removiendo el schema de los demás y en ocasiones nos quedamos a media corrida.

Fix: una DB por worker, con un sufijo de PYTEST_XDIST_WORKER:

def _suffix_dburl(url: str, worker: str) -> str:
    if "?" in url:
        url_part, _, query = url.partition("?")
        query = f"?{query}"
    else:
        url_part, query = url, ""
    if "/" not in url_part.split("@", 1)[-1]:
        return url
    prefix, dbname = url_part.rsplit("/", 1)
    return f"{prefix}/{dbname}_{worker}{query}"

_XDIST_WORKER = os.environ.get("PYTEST_XDIST_WORKER")
if _XDIST_WORKER:
    for key in ("DATABASE_URL", "TEST_DATABASE_URL"):
        if val := os.environ.get(key):
            os.environ[key] = _suffix_dburl(val, _XDIST_WORKER)

Con esto el worker gw0 agarra myapp_test_gw0, gw1 agarra _gw1, y así de manera secuencial.

Trampa 2: `max_locks_per_transaction`

Primera corrida con -n auto (10 workers en el runner self-hosted):

================== 31 passed, 11 warnings, 8 errors in 14.73s ==================

DROP SCHEMA CASCADE sobre ~50 tablas toma un relation-level lock por cada una.

10 workers x 50 = 500 locks.

El default de PostgreSQL para max_locks_per_transaction es 64.

Como lo arreglamos, bajamos el limite/cap a 4 workers por shard en vez de auto. 4 x 2 shards = 8 procesos paralelos.

Trampa 3: la divergencia del `DATABASE_URL`

Después de limitar a los workers, un test empezó a fallar en CI:

test_comp_expiry_worker_skips_stripe_managed
  sqlalchemy.exc.ProgrammingError: relation "subscriptions" does not
  exist

¿Por qué? El test invoca un background worker:

async def test_comp_expiry_worker_skips_stripe_managed(client, db, admin):
    member = await _seed_user(db, tier="TEST")
    sub = await _seed_sub(db, member, tier="TEST", ...)
    await db.commit()

    from app.workers.comp_expiry_worker import expire_comp_subscriptions
    await expire_comp_subscriptions()  # ← abre su propio AsyncSessionLocal

Fix: alinear DATABASE_URL con TEST_DATABASE_URL antes de que
app.main importe lo que sea:

# backend/tests/conftest.py — top del archivo, antes de importar app.main
_test_db = os.environ.get("TEST_DATABASE_URL")
if _test_db:
    os.environ["DATABASE_URL"] = _test_db

_XDIST_WORKER = os.environ.get("PYTEST_XDIST_WORKER")
if _XDIST_WORKER:
    for key in ("DATABASE_URL", "TEST_DATABASE_URL"):
        if val := os.environ.get(key):
            os.environ[key] = _suffix_dburl(val, _XDIST_WORKER)

from app.main import app  # ← engine construido con la URL correcta desde el inicio

Intento no. 4: template database de PostgreSQL

Cada worker todavía usa un DROP SCHEMA + CREATE TABLE × 50 al
inicio de la session.

En el runner ARM eso son ~5 segundos por worker.

PostgreSQL tiene un truco: CREATE DATABASE ... TEMPLATE
clona una DB vía copia a nivel de archivo en ~100ms en lugar de
ejecutar todo el SQL.

Construyes el schema una vez en una DB template dedicada, y luego cada worker la clona:

# backend/tests/conftest.py
TEMPLATE_DB_NAME = "myapp_test_template"
_TEMPLATE_LOCK_ID = 7321456789012345  # int arbitrario 

async def _ensure_template_db(admin_url: str, template_url: str) -> None:
    admin_engine = create_async_engine(
        admin_url, isolation_level="AUTOCOMMIT", poolclass=NullPool
    )
    try:
        async with admin_engine.connect() as conn:
            exists = (await conn.execute(
                sa.text("SELECT 1 FROM pg_database WHERE datname = :n"),
                {"n": TEMPLATE_DB_NAME},
            )).scalar_one_or_none()
            if not exists:
                await conn.execute(
                    sa.text(f'CREATE DATABASE "{TEMPLATE_DB_NAME}"')
                )
    finally:
        await admin_engine.dispose()

    tmpl_engine = create_async_engine(template_url, poolclass=NullPool)
    try:
        async with tmpl_engine.connect() as conn:
            has_schema = (await conn.execute(sa.text(
                "SELECT 1 FROM information_schema.tables "
                "WHERE table_schema = 'public' AND table_name = 'users' "
                "LIMIT 1"
            ))).scalar_one_or_none()
        if not has_schema:
            async with tmpl_engine.begin() as conn:
                await conn.run_sync(Base.metadata.create_all)
    finally:
        await tmpl_engine.dispose()


async def _clone_template_to(admin_url: str, dbname: str) -> None:
    admin_engine = create_async_engine(
        admin_url, isolation_level="AUTOCOMMIT", poolclass=NullPool
    )
    try:
        async with admin_engine.connect() as conn:
            # Matar conexiones stale para que el DROP no se bloquee
            await conn.execute(sa.text(
                "SELECT pg_terminate_backend(pid) FROM pg_stat_activity "
                "WHERE datname = :n AND pid <> pg_backend_pid()"
            ), {"n": dbname})
            await conn.execute(sa.text(f'DROP DATABASE IF EXISTS "{dbname}"'))
            await conn.execute(sa.text(
                f'CREATE DATABASE "{dbname}" TEMPLATE "{TEMPLATE_DB_NAME}"'
            ))
    finally:
        await admin_engine.dispose()

Trampa 4: `filelock` se cuelga

Los workers se pelean por crear el template. El primer intento usaba
filelock:

# No hagas esto
import filelock
lock = filelock.FileLock("/tmp/myapp_test_template.lock", timeout=120)
await asyncio.to_thread(lock.acquire)

Los workers consistentemente se quedan colgados y llegan a el timeout de 120s.

Con un cambio a un advisory lock de PostgreSQL, se soluciona. Es el mismo recurso compartido que ya necesitamos, se auto libera al cerrar la conexión:

async def _setup_db_via_template() -> None:
    split = _split_db_url(TEST_DATABASE_URL)
    if split is None:
        return
    prefix, dbname = split
    admin_url = f"{prefix}/postgres"
    template_url = f"{prefix}/{TEMPLATE_DB_NAME}"

    admin_engine = create_async_engine(
        admin_url, isolation_level="AUTOCOMMIT", poolclass=NullPool
    )
    try:
        async with admin_engine.connect() as conn:
            # Bloquea hasta conseguirlo; se auto-libera al cerrar la conexión
            await conn.execute(
                sa.text("SELECT pg_advisory_lock(:id)"),
                {"id": _TEMPLATE_LOCK_ID},
            )
            try:
                await _ensure_template_db(admin_url, template_url)
            finally:
                await conn.execute(
                    sa.text("SELECT pg_advisory_unlock(:id)"),
                    {"id": _TEMPLATE_LOCK_ID},
                )
    finally:
        await admin_engine.dispose()

    await _clone_template_to(admin_url, dbname)

El primer worker que agarra el lock construye el template (~5s); los demás esperan y luego clonan en ~100ms cada uno.

Después de todo esto tenemos DB por worker, alineación de DATABASE_URL, template clones, advisory locks — corriendo un smoke local de 7 archivos, notamos la diferencia:

88 passed, 5 warnings in 12.28s

Bajó de ~2 minutos con el setup inicial a unos cuantos segundos.

Intento no. 5: la medición que rompió el supuesto

El wall time por shard seguía como en 7 minutos.
El CI total en 11m una mejora modesta sobre el baseline inicial, pero no la bajada dramática que sugería el smoke local.

Hora de medir en serio.
Agregué un step diagnóstico que corre una vez y reporta tiempos:

- name: Diagnose pytest startup cost
  if: matrix.shard == 1
  run: |
    cd backend
    echo "::group::A — solo collection de pytest, sin coverage"
    time pytest tests/ --co -q --no-header --no-cov
    echo "::endgroup::"
    echo "::group::B — solo collection de pytest CON coverage"
    time pytest tests/ --co -q --no-header --cov=app --cov-report=
    echo "::endgroup::"
    echo "::group::C — corrida chiquita serial, sin coverage, sin xdist"
    time pytest tests/test_critical.py -q --no-header --no-cov
    echo "::endgroup::"

Los resultados en el runner ARM self-hosted fueron:

A — solo collection, sin coverage          real    0m18.909s
B — solo collection CON coverage           real    0m23.810s
C — corrida chica serial, sin nada         real    0m21.481s

La observación clave, hacer collection de 1826 tests y
correr un solo archivo chico sin coverage tardan lo mismo.

O sea, el costo no es la collection.
No es coverage (solo 5s de diferencia).
No es la ejecución de los tests.
Es el startup de pytest + el import del conftest.
Específicamente el from app.main import app en el conftest que jala ~45 modelos, ~50 routers, middleware, settings, todo de un jalón. Veinte segundos de cold import en este runner.

Cada vez.

Con xdist, cada uno de los 4 workers paga este costo de 20s
independiente.

Ahí estaban los ~80s perdidos.

Round 6: soltar `xdist`, irse a 4 shards

Si cada proceso de pytest usa 20s fijos de startup, la optimización
más barata es usarlo menos veces.
2 shards X 4 workers de xdist = 10 startups de pytest (2 controllers + 8 workers).
4 shards X 1 proceso serial = 4 startups de pytest.

test:
  runs-on: ${{ inputs.runner || 'self-hosted' }}
  needs: lint
  strategy:
    fail-fast: false
    matrix:
      shard: [1, 2, 3, 4]
  services:
    postgres:
      image: postgres:15
      env: { POSTGRES_DB: myapp_test, POSTGRES_USER: appuser, POSTGRES_PASSWORD: testpass }
      options: >-
        --health-cmd pg_isready --health-interval 10s
        --health-timeout 5s --health-retries 5
      ports: ["5432"]
  steps:
    - uses: actions/checkout@v5
    - uses: actions/setup-python@v6
      with: { python-version: "3.11" }
    - uses: astral-sh/setup-uv@v6
      with:
        version: "0.5.11"
        enable-cache: true
        cache-suffix: "shard-${{ matrix.shard }}"
    - run: cd backend && uv pip install --system -r requirements-dev.txt
    - name: Tests (shard ${{ matrix.shard }}/4)
      env:
        DATABASE_URL: postgresql://someappuser:sometestpass@localhost:${{ job.services.postgres.ports[5432] }}/myapp_test
        TEST_DATABASE_URL: postgresql+asyncpg://someappuser:sometestpass@localhost:${{ job.services.postgres.ports[5432] }}/myapp_test
        SECRET_KEY: ci-test-secret-key-32bytes-minimum-length
        ENV: test
        COVERAGE_FILE: .coverage.${{ matrix.shard }}
      run: |
        cd backend
        pytest tests/ \
          --splits 4 --group ${{ matrix.shard }} \
          --cov=app --cov-report= \
          -q --no-header

Cada shard corre ~450 tests en serial con un solo proceso de pytest.

Sin fan-out de xdist, sin re-import de conftest por worker, sin temas de CPU en los cold imports.

El runner self-hosted anuncia 4X de concurrencia disponible, así que los cuatro shards corren en paralelo.

El coverage se extiende a cuatro archivos:

coverage:
  needs: test
  steps:
    - uses: actions/checkout@v5
    - uses: actions/setup-python@v6
      with: { python-version: "3.11" }
    - uses: astral-sh/setup-uv@v6
      with: { version: "0.5.11" }
    - run: uv pip install --system coverage==7.6.1
    - uses: actions/download-artifact@v5
      with:
        pattern: coverage-*
        path: backend/
        merge-multiple: true
    - run: |
        cd backend
        coverage combine .coverage.1 .coverage.2 .coverage.3 .coverage.4
        coverage report --fail-under=60

También quité el -v y lo cambié por -q --no-header. Con xdist, el -v bufferea el output por worker hasta que termina un test, -q tiene output instantáneo y muestra la salida de inmediato.

El resultado

Corrida real del CI después del push:

✓ lint              in 1m6s
✓ test (1)          in 6m46s
✓ test (2)          in 6m44s
✓ test (3)          in 6m57s
✓ test (4)          in 6m57s
✓ coverage          in 10s
✓ build-and-deploy  in 1m53s

Timeline del test step del shard 1:

16:31:44  inicio del step
16:32:57  [pytest-split] Running group 1/4   ← 1m13s adentro
16:33:25  ........... [ 15%]                  ← primer punto a 1m41s
16:34:18  ........... [ 45%]
16:37:04  ........... [ 91%]
16:37:47  ........... [100%]  472 passed in 5m21s

Tiempo al primer output: 1m13s vs 2m37s antes.
Más o menos a la mitad.

CI total: 10m16s vs ~20m del baseline antes de cualquier optimización.

Otras cosas que probe y que definitivamente no ayudaron

Precompile de pyc (python -m compileall). Medición local: 13.0s en frío vs 12.6s en caliente.
pytest-xdist --dist worksteal está bueno cuando cada worker tiene un costo de setup parecido. Cuando el setup es ~20s y los tests son mayormente rápidos, el impuesto de startup por worker se come la ganancia de paralelismo.
filelock para serializar entre procesos. No me serializaba bien los workers en mi setup. Me cambié a advisory locks de PG.
El flag -v. Causaba 2 minutos de buffering de output bajo xdist sin beneficio en performance.

Qué sí ayudaría a futuro

Mejorar el conftest. El from app.main import app es el costo más grande en mi caso. La app importa cada router, cada modelo, cada service al arranque. Partirla (lazy router registration, o romper el import monolítico) bajaría a 20s de startup.
Una segunda pasada en los shards. pytest-split balancea por duración. Si un shard consistentemente va 30s atrás, re-balancea:

   pytest --splits 4 --group 1 --store-durations

comitea un .test_durations nuevo contra el que los futuros runs
se balancean.

Sacar coverage del hot path. Coverage solo agregó ~5s en ARM en nuestro benchmark, pero a ~5s × 4 shards = 20s ahorrados. Trade-off: o lo aceptas en el job de tests o agregas un job de coverage no-paralelo aparte. Nosotros lo dejamos en el path.

Lecciones

Mide antes de optimizar. Varias horas de trabajo en xdist, template DBs y BuildKit fueron útiles, pero el verdadero unlock vino de un step diagnóstico de 30 segundos que me dijo que el cuello de botella era el startup de pytest, no nada de lo que llevaba atacando.
El pytest más rápido es uno que no inicias dos veces. Cada invocación de pytest paga un costo de startup fijo. Con un conftest pesado, ese costo domina todo lo demás. Más paralelismo = más startups = más costo. Menos shards más grandes le ganan a más shards chiquitos pasado un umbral.
pytest-xdist no es gratis. Funciona bien cuando el costo por test >> el costo de startup. Cuando el startup es 20s y los tests son de 500ms, la ecuación se invierte.
Recursos por worker necesitan aislamiento por worker. El bug sutil fue que los background workers abrían su propio AsyncSessionLocal apuntando a la DB equivocada. El fix no estaba en la aplicación — estaba en el conftest de tests, alineando las env vars antes de que la app importara nada.
PostgreSQL tiene las primitivas. Advisory locks para sincronizar entre procesos, CREATE DATABASE ... TEMPLATE para bootstrap de schema. Las dos me salvaron de inventar mecanismos más débiles encima.
Los BuildKit cache mounts siguen sub-utilizados. Dos líneas en un Dockerfile (--mount=type=cache para los cachés de apt y uv/pip) bajaron los docker builds repetidos de ~40s a ~10s, pero solo después de cambiarse del builder legacy de Docker vía setup-buildx-action.

Código completo disponible en Github:

Code companion — backend CI optimisation post

Stand-alone, copy-pasteable files for every code block in docs/blog-backend-ci-optimization.md.

Each folder maps to one round of the blog narrative:

00-starting-point/   the test job, Dockerfile, conftest before any work
01-uv/               drop pip for uv (Dockerfile + CI snippet)
02-buildkit/         BuildKit cache mounts + setup-buildx + --push
03-xdist-traps/      per-worker DB + DATABASE_URL alignment
04-template-db/      Postgres template DB + pg_advisory_lock
                     (plus the filelock dead-end as documentation)
05-diagnostic/       the measurement step that broke the assumption
06-final/            final shape — 4 matrix shards, no xdist, cumulative
                     Dockerfile + complete conftest

Notes

The xdist code (03, 04) is still present in 06-final/conftest.py for local pytest -n N runs even though the CI dropped xdist in favour of serial matrix shards.
04-template-db/filelock-attempt-DEAD-END.py is kept around…

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

DEV Community

Code companion — backend CI optimisation post

Suggested reading order

Notes

Intento no. 1: instalar dependencias más rápido con `uv`

Intento no. 2: BuildKit cache mounts en el Dockerfile

Intento no. 3: la trampa de `pytest-xdist`

Trampa 1: `DROP SCHEMA` entre workers

Trampa 2: `max_locks_per_transaction`

Trampa 3: la divergencia del `DATABASE_URL`

Intento no. 4: template database de PostgreSQL

Trampa 4: `filelock` se cuelga

Intento no. 5: la medición que rompió el supuesto

Round 6: soltar `xdist`, irse a 4 shards

El resultado

Otras cosas que probe y que definitivamente no ayudaron

Qué sí ayudaría a futuro

Lecciones

Code companion — backend CI optimisation post

Suggested reading order

Notes

推荐订阅源

DEV Community

Code companion — backend CI optimisation post

Suggested reading order

Notes

Intento no. 1: instalar dependencias más rápido con uv

Intento no. 2: BuildKit cache mounts en el Dockerfile

Intento no. 3: la trampa de pytest-xdist

Trampa 1: DROP SCHEMA entre workers

Trampa 2: max_locks_per_transaction

Trampa 3: la divergencia del DATABASE_URL

Intento no. 4: template database de PostgreSQL

Trampa 4: filelock se cuelga

Intento no. 5: la medición que rompió el supuesto

Round 6: soltar xdist, irse a 4 shards

El resultado

Otras cosas que probe y que definitivamente no ayudaron

Qué sí ayudaría a futuro

Lecciones

Code companion — backend CI optimisation post

Suggested reading order

Notes

Intento no. 1: instalar dependencias más rápido con `uv`

Intento no. 3: la trampa de `pytest-xdist`

Trampa 1: `DROP SCHEMA` entre workers

Trampa 2: `max_locks_per_transaction`

Trampa 3: la divergencia del `DATABASE_URL`

Trampa 4: `filelock` se cuelga

Round 6: soltar `xdist`, irse a 4 shards