惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Google DeepMind News
Google DeepMind News
F
Fortinet All Blogs
阮一峰的网络日志
阮一峰的网络日志
Apple Machine Learning Research
Apple Machine Learning Research
爱范儿
爱范儿
WordPress大学
WordPress大学
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
J
Java Code Geeks
罗磊的独立博客
S
SegmentFault 最新的问题
V
V2EX
V
Visual Studio Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
美团技术团队
博客园 - 三生石上(FineUI控件)
Stack Overflow Blog
Stack Overflow Blog
Y
Y Combinator Blog
MyScale Blog
MyScale Blog
D
Docker
Google DeepMind News
Google DeepMind News
Blog — PlanetScale
Blog — PlanetScale
M
Microsoft Research Blog - Microsoft Research
Martin Fowler
Martin Fowler
S
Secure Thoughts
B
Blog
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
MongoDB | Blog
MongoDB | Blog
C
Cisco Blogs
C
CERT Recently Published Vulnerability Notes
T
True Tiger Recordings
GbyAI
GbyAI
P
Proofpoint News Feed
P
Privacy International News Feed
Jina AI
Jina AI
The Cloudflare Blog
I
Intezer
AWS News Blog
AWS News Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
S
Security Archives - TechRepublic
NISL@THU
NISL@THU
The Register - Security
The Register - Security
Recent Commits to openclaw:main
Recent Commits to openclaw:main
P
Palo Alto Networks Blog
S
Schneier on Security
L
LINUX DO - 热门话题
C
CXSECURITY Database RSS Feed - CXSecurity.com
Security Latest
Security Latest
C
Cybersecurity and Infrastructure Security Agency CISA

DEV Community

Why AI Should Not Write SQL Against ERP Databases Vibe coding works until it doesn't. The debt is real. Shipping at the Edge: Migrating a Coffee Subscription Platform to Cloudflare Workers Stop Tab-Switching: A Developer's Guide to Color Tools That Actually Fit the Workflow DevOps vs MLOps vs AIOps: What Changes, What Stays, and a Simple Roadmap to Get Started Run Powerful AI Coding Locally on a Normal Laptop 5 n8n Automations Every WooCommerce Store Needs (Save 10+ Hours/Week) What I Learned Building My Own AI Harness Hytale Servers Will Fail Treasure Hunts Until We Fix Our Event Handling Redux in React: Managing Global State Like a Pro Unfreezing Your GitHub Actions: Troubleshooting Stuck Deployments and Protecting Your Git Repo Statistics Unlocking Project Discoverability on GHES: A Key to Software Engineering Productivity When the Cleanup Code Becomes the Project Rockpack 8.0 - A React Scaffolder Built for the Age of AI-Assisted Development Mismanaging the Treasure Hunt Engine in Hytale Servers Will Get You Killed Why Hardcoded Automations Fail AI Agents Stop Calling It an AI Assistant. It’s Already Managing Your Company Why I built a post-quantum signing API (and why JWT is on borrowed time) Weekend Thought: Frontend Build Tools Suffer From Work Amnesia A 10-Line Playwright Trick That Saved Me Hours on Every Sephora Run AI Is Changing Engineering Culture More Than We Realize Everyone Was Focused on Gemini, But Infinite Scaler Was the Real Twister "Gemma 4 Analyzed My Bank Statements – Apparently I 'Have a Problem' with Coffee and Late-Night Apps" #css #webdev #beginners #codenewbie The Hidden Layer Every AI Developer Must Learn AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent RDS Reserved Instance Pricing: Every Engine, Every Rule, Real Dollar Savings How To Build An AI-Powered MVP Without Burning Your Startup Budget In 2026 Reading a Psychrometric Chart Without Getting Lost LMR-BENCH: Can LLM Agents Reproduce NLP Research Code? (EMNLP 2025) How to turn text into colors (without AI) Building Real-Time Apps in Node.js with Rivalis: WebSockets, Rooms, Actors, and a Binary Wire This Week In React #282 : Security, Fate, TanStack, Redux, Jotai | Hermes-node, Expo, Rozenite, Harness | TC39, Bun, pnpm, npm, Yarn, Node AI Copilot vs AI Agent Architecture - What's Actually Different (And Why It Matters) Smart Contract Security: NEAR's Futures Surge and AI Token Risks Database Maintenance: Tracing Production Incidents to Their Root Cause Stop juggling AI SDKs in PHP — meet Prisma Google Quietly Changed What “Apps” Mean at I/O 2026 The Infrastructure Team Is the Real Single Point of Failure Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team Your AI Has a Memory. It Just Doesn’t Know What to Remember. Claprec: Engineering Tradeoffs - Limited time vs. Perfection (6/6) Building a Daily Google News API Monitor in Python Building RookDuel Avikal: From Chess Steganography to Post-Quantum Archival Security Google I/O e IA: o que realmente muda na vida do dev? Color Contrast Failures: The Number One Accessibility Issue and How to Fix It # I Watched 15 Hours of Hermes Agent Videos So You Don't Have To Cómo solucionar el bucle infinito en useEffect con objetos y arrays en React The First Agent-Centric Cloud Security Platform — And Why We Didn't Build It That Way On Purpose Most Treasure Hunts Engines on Hytale Servers Are Built to Fail - Lessons from a Burned Database GhostScan v3.0 — From Closed-Source EXE to Open-Source Pentest Framework De hojas de cálculo a IA: construyendo una plataforma SRM moderna When is AI fine in education? Python Tools for Managing API Rate Limits in Data Pipelines How to Implement Exponential Backoff for Rate-Limited APIs in Python "My Web Chat Wasn't a Real Channel. That Broke My Agent Pipeline" next-advanced-sitemap v1.0.7 — safer URL ingestion & automatic trimming for Next.js sitemap generation I keep seeing people build an AI lead processing agent when they really need a 6-step rules engine AI Powered Student Learning Assistant Using Gemma 4 How I Built a Drop-In Proxy to Slash My OpenAI Bills by 20%+ Automatically Building a Sarcastic AI English Tutor with Persona-as-Code and Gemini Audio Input for Pronunciation Correction Five Years Later, I Finally Have 96GB VRAM — What It Actually Unlocks for Agent Loops Turning a 1-Line Idea Into a 40-Second Short with a 10-Beat Local Video Pipeline Running LTX-2.3 Alongside TTS on a Single 96GB GPU with a Cold-Start Architecture Cutting LTX-2 22B Peak VRAM by 40% with fp8_cast — and Why optimum-quanto Was a Trap HiDream Skeleton Mode: Prompt Beats OpenPose Ref — 8 Patterns Benchmarked Replicating a Language-Learning Comedy Short with Claude Code — Gemini as a Multimodal Sub-Agent HiDream-O1-Image 3–8x Faster: Benchmarking Steps, CFG, and Resolution AWS Savings Plan Buying Strategy: How to Layer, Size, and Time Commitments application.properties I built a macro tracker powered by AI + attitude Solace: A Global Mental Health First Responder Built with Gemma 4 Why Blocking Prompt Injection Is Wrong — and What to Do Instead The AI code tools Dutch developers actually use in 2026 (field notes) Automatic Error Recovery in AI Agent Networks You Are Not Choosing Building a Cinematic Adaptive Learning Intelligence with Gemma 4, Gemini, and OpenAI(Powered by Gemma 4) CLAUDE.md for Angular: 13 Rules That Make AI Write Idiomatic, Production-Ready Components I tested 7 vector databases for my RAG stack in 2026, here's the one nobody is talking about (yet) Claude agreed with a false fact I gave it. Confidently. That broke my workflow Google's "Budget" Model Just Beat Its Own Flagship. Here's What That Actually Means for Developers. How I built a monitoring SaaS for Joomla, WordPress & PrestaShop agencies Shifting from Passive Dashboards to Automated Remediation: A Guide to Next-Generation FinOps and CloudZero Alternatives Automating CSV WooCommerce Imports Without Plugins Why Wobbly Plugs and Overheating Outlets Are More Dangerous Than You Think (UL 498 Explained) Building an AI Model Evaluation Pipeline on AWS for Audio Content Generation Your Side Project Is Not a Business Neurodiversity and the two layers of cognition GitHub Internal Repositories Breached: Source Code and Internal Data Allegedly Exfiltrated in 2026 Supply Chain Attack Stop drowning in files: auto-organize your Google Drive with n8n (free workflow JSON) Secure Firmware Updates with a Secure Element: Building Trust Into the Bootloader I Thought Domain-Driven Design Was a Waste of Time. I Was Wrong. AI Content Is Getting Tagged Like Livestock — And That's Actually Good ESP32 Into a Speech-to-Text Device Why Simple Audio Transcription Fails in Healthcare: The Need for Clinical Reasoning Engines The 114KB Span Attribute That Hid Our LCP Data How to Scale AI Development Beyond Prototype Speed Agent Execution Environments: Cloud Sandbox vs Local GUI vs Hybrid AI code review checklist that actually catches problems
DevOps & Deployment Essentials: Your Practical CI/CD Guide
Vected Techn · 2026-05-21 · via DEV Community

If you're still deploying code by SSHing into a server and running git pull, this guide is for you.
Modern deployment isn't magic. It's automation. Consistency. Confidence.
Let's build that.

Part 1: The Pipeline – From Code to Production
Every production application needs this flow:
Code → Build → Test → Deploy → Monitor

Feedback & Rollback if needed
Each stage is automated. When something fails, you catch it before users do.
Stage 1: Code Commit (Git Workflow)
Your repository structure matters. A lot.
my-app/
├── .github/workflows/ # CI/CD pipelines
├── src/ # Source code
├── tests/ # Unit & integration tests
├── docker/ # Dockerfile & related
├── k8s/ # Kubernetes manifests
├── terraform/ # Infrastructure as code
├── README.md
└── .gitignore
Branch strategy that works:
main (production)
↑ (merge only via PR)
develop (staging)
↑ (merge feature branches)
feature/new-dashboard (your work)
feature/user-auth (teammate's work)
hotfix/critical-bug (urgent fix)
Rule: Never push to main directly. Always go through develop, create a pull request, get code review, run automated tests.
bash# You're working on a feature
git checkout -b feature/new-dashboard
git commit -m "feat(dashboard): add charts"
git push origin feature/new-dashboard

Create PR on GitHub/GitLab

Automated tests run

Code review happens

Merge to develop

Automated deploy to staging

Test on staging

When ready, merge develop → main

Automated deploy to production

Stage 2: Build (Docker)
Stop deploying Python/Node/Go directly. Use containers.
dockerfile# Dockerfile
FROM node:18-alpine

WORKDIR /app

Copy package files

COPY package*.json ./

Install dependencies

RUN npm ci

Copy source

COPY src ./src

Health check

HEALTHCHECK --interval=30s --timeout=5s \
CMD node -e "require('http').get('http://localhost:3000/health', (r) => {if (r.statusCode !== 200) throw new Error(r.statusCode)})"

Expose port

EXPOSE 3000

Start app

CMD ["npm", "start"]
Why Docker:

✅ "Works on my machine" → "Works everywhere"
✅ Version lock entire dependencies
✅ Easy to scale (run 10 copies)
✅ Security isolation
✅ Simple rollback (just switch image version)

Build optimization:
dockerfile# Bad: Bloated image
FROM node:18
COPY . .
RUN npm install
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

Result: ~500MB image

Good: Multi-stage build

FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY src ./src
RUN npm run build

FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json .
EXPOSE 3000
CMD ["npm", "start"]

Result: ~120MB image

Stage 3: Test (Automated)
Your CI pipeline runs tests automatically:
yaml# .github/workflows/ci.yml
name: CI Pipeline

on:
push:
branches: [main, develop]
pull_request:
branches: [develop]

jobs:
test:
runs-on: ubuntu-latest

services:
  postgres:
    image: postgres:15
    env:
      POSTGRES_PASSWORD: testpass
    options: >-
      --health-cmd pg_isready
      --health-interval 10s
      --health-timeout 5s
      --health-retries 5

steps:
  - uses: actions/checkout@v3

  - uses: actions/setup-node@v3
    with:
      node-version: '18'

  - name: Install dependencies
    run: npm ci

  - name: Run linting
    run: npm run lint

  - name: Run unit tests
    run: npm run test:unit

  - name: Run integration tests
    run: npm run test:integration
    env:
      DATABASE_URL: postgres://user:testpass@localhost/testdb

  - name: Upload coverage
    uses: codecov/codecov-action@v3
    with:
      files: ./coverage/lcov.info

Enter fullscreen mode Exit fullscreen mode

build:
needs: test # Only run if tests pass
runs-on: ubuntu-latest

steps:
  - uses: actions/checkout@v3

  - name: Build Docker image
    run: |
      docker build -t myapp:${{ github.sha }} .
      docker tag myapp:${{ github.sha }} myapp:latest

  - name: Push to registry
    run: |
      docker login -u ${{ secrets.REGISTRY_USER }} -p ${{ secrets.REGISTRY_TOKEN }}
      docker push myapp:${{ github.sha }}

Enter fullscreen mode Exit fullscreen mode

What this does:

Runs linting (catch style issues)
Runs unit tests (catch logic errors)
Runs integration tests with real database (catch integration issues)
Only if ALL pass, build Docker image
Push to registry

One failing test = no deploy. That's the point.
Stage 4: Deploy (Multiple Strategies)
Blue-Green Deployment (Safest)
Blue (current): v1.2.3 (users hitting this)
Green (new): v1.3.0 (being deployed)

Steps:

  1. Deploy v1.3.0 to green
  2. Run smoke tests on green
  3. If good: Switch traffic from blue → green
  4. If bad: Switch back to blue (instant rollback)
  5. Keep blue running for 1 hour (safety net) Canary Deployment (Progressive) Version 1.2.3: 95% of traffic Version 1.3.0: 5% of traffic

Monitor:

  • Error rates
  • Response times
  • Business metrics

If all good, shift traffic:

  • 1.3.0: 25% → 50% → 100%

If problems appear, rollback immediately
Rolling Deployment (Traditional)
Deploy gradually:

  1. Take 1 instance down, deploy new version
  2. Bring it up
  3. Repeat for next instance
  4. Users never experience full downtime

Downsides: Temporarily running mixed versions (harder to debug)

Part 2: Container Orchestration (Kubernetes Essentials)
You don't need to be a Kubernetes expert. You need to know:
Basic Concepts
Pod = Smallest unit (like a container wrapper)
Service = How pods talk to each other + expose to outside
Deployment = How you define what you want running
ConfigMap = Configuration (not secrets)
Secret = Passwords, API keys, etc (encrypted)
Simple Kubernetes Deployment
yaml# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3 # Run 3 copies
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
ports:
- containerPort: 3000

    # Health checks
    livenessProbe:
      httpGet:
        path: /health
        port: 3000
      initialDelaySeconds: 30
      periodSeconds: 10

    # Resource limits
    resources:
      requests:
        cpu: "100m"
        memory: "128Mi"
      limits:
        cpu: "500m"
        memory: "512Mi"

    # Environment variables
    env:
    - name: LOG_LEVEL
      value: "info"
    - name: DATABASE_URL
      valueFrom:
        secretKeyRef:
          name: app-secrets
          key: database-url

Enter fullscreen mode Exit fullscreen mode


service.yaml

apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
type: LoadBalancer # Expose to internet
selector:
app: myapp
ports:

  • protocol: TCP port: 80 targetPort: 3000 What happens:

Kubernetes creates 3 pods running your app
If one crashes, it's replaced automatically
Load balancer distributes traffic
Rolling updates: New version gradually replaces old

Auto-scaling
yamlapiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 10
metrics:

  • type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70
  • type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 Auto-scales from 3 to 10 pods based on CPU/memory usage.

Part 3: Monitoring & Alerting
Deployed code that isn't monitored is just waiting to fail silently.
The Three Pillars of Observability

  1. Logs (What happened) javascript// Structured logging logger.info('User login', { userId: user.id, timestamp: new Date(), ipAddress: req.ip, duration: 245 // ms });

// Output:
// {"level":"info","message":"User login","userId":"123","timestamp":"2024-05-20T...","ipAddress":"192.168.1.1","duration":245}

  1. Metrics (What's the state) javascript// Application metrics const httpDuration = new Histogram({ name: 'http_request_duration_ms', help: 'Duration of HTTP requests in ms', labelNames: ['method', 'route', 'status_code'] });

app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
httpDuration.labels(req.method, req.route.path, res.statusCode).observe(duration);
});
next();
});

  1. Traces (How requests flow) javascript// Distributed tracing const span = tracer.startSpan('database.query'); const result = await db.query(sql); span.setTag('query', sql); span.finish();

// Shows: Request → Service A → Service B → Database
// Plus: Time spent at each step
Setting Up Alerts That Matter
yaml# Prometheus alert rules
groups:

  • name: app
    rules:

    • alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05 for: 5m annotations: summary: "High error rate detected"
    • alert: DatabaseConnectionPoolExhausted expr: db_connections_active / db_connections_max > 0.9 for: 2m annotations: summary: "Database connection pool 90% full"
    • alert: HighLatency expr: histogram_quantile(0.95, http_request_duration_ms) > 1000 for: 5m annotations: summary: "p95 latency > 1 second" Key principle: Alert on symptoms, not facts.

❌ Alert: "CPU > 80%"
✅ Alert: "p95 latency > 1s" (high CPU matters only if users see it)
❌ Alert: "Disk 85% full"
✅ Alert: "Disk full in 24 hours at current rate" (gives time to act)

Part 4: Rollback & Recovery
Everything fails. What matters is how fast you recover.
Automated Rollback
yaml# GitHub Actions

  • name: Deploy to production
    run: kubectl set image deployment/myapp myapp=myapp:v1.3.0

  • name: Wait for rollout
    run: kubectl rollout status deployment/myapp --timeout=5m

  • name: Run smoke tests
    run: npm run test:smoke

  • name: If tests fail, rollback
    if: failure()
    run: kubectl rollout undo deployment/myapp
    Database Rollback
    bash# If you deployed a database migration that breaks

Option 1: Have down migration (safe)

npm run migrate:down
npm run migrate:up # New fixed version

Option 2: Point-in-time recovery

aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier prod-db-restored \
--db-snapshot-identifier prod-db-2024-05-20-03-00
The Runbook
Create this before you need it:
markdown# Incident Runbook: Database is Slow

Symptoms

  • p95 latency > 5s
  • Users complain app is slow
  • CPU on database high

Immediate Actions (0-5 min)

  1. Check if we can scale database (vertical scale)
  2. Kill long-running queries: SELECT * FROM long_queries
  3. If query is a new deploy, rollback

Root Cause (5-30 min)

  1. Check recent deployments: When did this start?
  2. Check slow query log
  3. Check if query plan changed

Resolution

  • Option A: Optimize query (add index)
  • Option B: Rollback problematic deploy
  • Option C: Scale database

Prevention

  • Add query time monitoring
  • Add alert for p95 latency
  • Load test before deploy

Part 5: Common Mistakes (And How to Avoid Them)
Mistake 1: Deploying Too Big Changes
❌ 10,000 lines of code changed in one deploy
✅ 200-500 lines per deploy
Reason: When something breaks, you know exactly what caused it.
Mistake 2: No Rollback Plan
❌ "We're committed now"
✅ Every deploy has a rollback procedure
Mistake 3: Testing Manually
❌ "We'll test in staging by hand"
✅ Automated tests run before every deploy
Mistake 4: Ignoring Logs/Metrics
❌ "The app is running, who cares about logs?"
✅ Structured logging and metrics from day 1
Mistake 5: Same Config Everywhere
❌ Production and staging use same database
✅ Separate infrastructure, separate secrets, separate configs

The DevOps Checklist
Before you call it "production-ready":
✅ CI/CD Pipeline: Every commit triggers tests & build
✅ Automated Tests: Unit + integration tests pass before deploy
✅ Containerized: Docker image with multi-stage build
✅ Orchestrated: Runs on Kubernetes or managed service
✅ Health Checks: Liveness & readiness probes configured
✅ Monitoring: Logs, metrics, traces all flowing
✅ Alerts: Meaningful alerts (not noise)
✅ Rollback Plan: Can recover in < 5 minutes
✅ Secrets Management: Passwords never in code
✅ Documentation: Runbooks for common issues

Tools You'll Use

CI/CD: GitHub Actions, GitLab CI, Jenkins
Containerization: Docker, Podman
Orchestration: Kubernetes, Docker Swarm, AWS ECS
Monitoring: Prometheus, Datadog, New Relic
Logging: ELK Stack, Splunk, CloudWatch
Tracing: Jaeger, Zipkin, Datadog

Next Steps

Set up CI/CD: Start with GitHub Actions (free)
Containerize your app: Write a Dockerfile
Deploy to staging: Use Docker Compose locally, Kubernetes in cloud
Add monitoring: Start with basic metrics
Create runbooks: Document how to handle failures
Practice rollbacks: Actually execute a rollback (in staging first)

Resources

GitHub Actions Docs
Kubernetes Documentation
Docker Best Practices
Prometheus Monitoring

Master DevOps Practices
At Vector Skill Academy, we teach DevOps the way production teams do it. Automation. Consistency. Reliability.
Explore our DevOps & Deployment program