惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

DEV Community

How the Events Table That Looked Right Killed Our Queue Three Failures My AI Memory System Caught — And the Flaw It Revealed in Itself dotnet Framework life cycle tool LangGraph 워크플로우 템플릿 (v41) I built a free image compression API — no signup, just curl PREDICTION-20260525-0007: boredom-with-asymmetric-leverage [2026-Q3 through 2027-Q3] [Boost] How to integrate the QuickBooks Invoice API in 2026 How I Cut My Anthropic API Bill by 50% With a Local Python Tool Vibe Coding Problems: 7 Visual Bugs AI Code Generators Always Ship Chinese AI Models 2026: The Agentic Revolution, Hardware Independence, and What It Means for Global Developers The Quiet AI War Inside Your Browser The 12-Line Anti-Bot Trick That Saved Our Airdrop Snapshot From Sybil Farms Building a production-ready SaaS dashboard in Next.js 16 — Recharts, TanStack Table, dark mode, and collapsible sidebar Why 2026 Belongs to Agentic AI (And How to Build Your First Local Agent) It Was 2024 When We Tried to Outsmart the Treasure Hunt Engine RAG 시스템 실전 구축 (v40) I Found a Tool That Generates a Complete .NET 8 or Java Spring Boot API From SQL Schema in 30 Seconds I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. Streaming LLM responses to the browser in Go (Server-Sent Events) How We Publish and Manage Educational Admission Updates at Scale on DailyAxom A prompt is not a conversation. It's a component contract. How to Pass the EAA 2025 Accessibility Audit — A Step-by-Step WCAG Checklist Building an Autonomous MCP Lead Generation System with Hermes Agent LangGraph 워크플로우 템플릿 (v40) How I Built 100 Browser-Based Image Tools With No Server (FFmpeg WASM, PDF-lib, AI Background Removal) Nginx CVE-2026-9256, AI Prompt Injection Defenses, and Claude AI Data Leak Demo Scaling RAG for 10M+ Docs, .md Agent Memory, & Claude Code for Motion Graphics Diagram as Code with draw.io DuckDB Delta, PostgreSQL 17 Migration, & SQLite Optimization Deep Dives Windows 11 Microsoft Account Login Recovery During Internet Restrictions The Linux Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Spec-Driven Development Without an IDE: I Generated NestJS, Go, Spring Boot, Laravel, and Rust Apps From a Single PRD File Components are states Edge SEO y Middleware: Cómo Interceptar a Googlebot y LLMs antes de llegar a tu Servidor Context window exceeded at turn 23. Here's how I track token usage without a tokenizer. My Hermes agent spent $3 before I noticed. Now it can't. My Hermes agent's stop condition was a 40-line if/elif chain. I replaced it with 3 lines. My agent kept hitting context limits. This one function fixed it. Create and configure Azure Firewall Your Hermes agent's audit log is leaking customer emails. Here's a 100-line lib that fixes that. My agent kept forgetting what it was doing. A scratchpad fixed it. I replaced 200 lines of ad-hoc state management in my Hermes agent with one object. Per-Key Rate Limiting for Agent Tool Calls: Stop One User From Breaking Everything Composable Output Guardrails: Filter Agent Responses Before They Reach Users Sanitize Your LLM Message Lists Before Every API Call Thread a Run ID Through Every Agent Call So You Can Debug Anything Normalize Provider Error JSON So Your Agent Can Actually Handle Failures Priority Queue for Agent Sub-Tasks: Stop Processing Low-Priority Work First Static Lint Rules for Your LLM Prompts (Before They Hit Production) tool-call-budgets: Stop Runaway Agent Loops Before They Hit Your Invoice Step Through Your Agent's Failures Like a Debugger The Simplest Stop Condition: A Hard Cap on Agent Loop Iterations Score Your Agent's Responses With a 0.0-1.0 Rubric (No LLM Judge Required) Fix Bad Structured Output by Feeding the Error Back to the Model Building an effective Storyblok Tool Plugin with SvelteKit How to Get Your Renault / Dacia Radio Code for Free RAG 시스템 실전 구축 (v39) Retraction — scrml’s Living Compiler I built a fitness app where the AI roasts you for eating pizza (and hypes you when you PR) The Top SaaS Founder Communities on Discord (Beyond the AI Hype) I Built a Production-Grade Async Job Queue from Scratch — Here's Everything That Actually Happened How to watch SMS from multiple Android phones in one iOS app We Didn’t Want Another AI Wrapper — So We Explored a High-Speed Hermes Orchestrator for Engineering Crews Multi-tenant além do TenantId: problemas reais e aprendizados em sistemas .NET After failing 23 times, I am sharing How I Actually Prepare for a Tech Interview Every Single Time Now. I built an app that works like a nutritionist for your brain. Here's what happened in 7 days. GoBadge Dynamic: From Module Stats to Universal Badges LangGraph 워크플로우 템플릿 (v39) The git Commands You Forgot Exist (And Why AI Workflows Make Them Relevant Again) Six Levels of MCP Servers One container to replace Grafana + Loki + Tempo + Prometheus The Request/Response Cycle, HTTP, Auth, JWT, OAuth & Sessions — Explained Properly Python Week 3: We Stopped Repeating Ourselves (Loops!) Creating a Custom Grid Editor tool in Unreal Engine 我做了个付费 Telegram bot。Telegram Stars 实际给开发者多少钱,我算了一笔账。 I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python A practitioner's guide to getting more value out of AI coding: agent quality & token optimization How to Handle Telegram Albums in Telegraf I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages How to Handle Telegram Albums in grammY RAG 시스템 실전 구축 (v38) Beyond Pip Install: Why Your AI Agent Needs a "Hermetic" Life-Support System to Survive Resume Building using HTML & CSS SpecFlow: Multi-Agent SDD in Cursor (4 phases, /approve, single code writer) Running ASR for smart homes in the NPU of Intel processors "Building a CI/CD Pipeline From Scratch: A Practical Guide for Developers (with GitHub Actions)" SpecFlow: SDD multi-agente en Cursor (4 fases, /approve, un solo escritor de código) How to Extract Your Full Team Hierarchy from HubSpot (the API doesn't expose it) Adobe Commerce Cloud now costs $40k/year. We migrated from Adobe Commerce to Magento Open Source — here's the honest breakdown .klickd v4.0.0 — Portable AI memory with constraints, strict schemas, and test vectors We Trust Third Party Code, It’s Time to Trust AI Generated Code LangGraph 워크플로우 템플릿 (v38) Sustainable AI Starts with Efficient AI Find Remove duplicated files in Google Drive How to Detect GPU Waste in a Kubernetes Cluster The Privacy Bug in My First Chrome Extension (And How to Avoid It) Serverless Mental Models: What They Don't Tell You Before You Build Preventing GPT hallucination in automated content pipelines: how I structure Make.com flows with data injection Hmm, where were we?
Designing TikTok from Scratch — A System Design Deep Dive
Daniel Keya · 2026-05-26 · via DEV Community

Who is this for? Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood.


Scale We're Designing For

Metric Number
Monthly active users 1B+
Videos uploaded per day ~34 million
Target feed latency (P99) ~167ms
Peak egress bandwidth ~26 Tbps

1. Requirements

Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one.

Functional requirements:

  • Upload and transcode short videos
  • Serve a personalized "For You" feed
  • Like, comment, share, follow
  • Search videos and creators
  • Live streaming

Non-functional requirements:

  • High availability (99.99% uptime)
  • Sub-200ms feed latency
  • Horizontal scalability
  • Global CDN video delivery
  • Strong eventual consistency

2. High-Level Architecture

The system splits into four major domains: ingestion (upload pipeline), serving (read path), recommendation (ML feed), and social graph.

┌─────────────────────────────────────────────────┐
│              Mobile / Web Clients                │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│         Global CDN / Edge PoPs                   │
│   Video delivery, static assets, geo-routing    │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│       API Gateway + Load Balancer                │
│   Auth, rate limiting, routing, TLS termination │
└────────┬────────────┴────────────────┬──────────┘
         │                             │
   ┌─────▼──────┐  ┌──────────────┐  ┌▼────────────────┐
   │  Upload    │  │ Feed Service │  │  Social Graph   │
   │  Service   │  │(pre-compute  │  │    Service      │
   │            │  │ + real-time) │  │                 │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │ Transcode  │  │Recommendation│  │  Notification   │
   │  Workers   │  │   Engine     │  │    Service      │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │  Object    │  │ Feature Store│  │  Search Service │
   │  Storage   │  │(Redis+Cassie)│  │ (Elasticsearch) │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
┌────────▼────────────────▼────────────▼──────────────┐
│              Async Message Bus (Kafka)               │
└──────────┬──────────────┬──────────────┬────────────┘
           │              │              │
    ┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐
    │MySQL/Vitess│ │   Redis   │ │  Cassandra  │
    │(user data, │ │ (counters,│ │ (timelines, │
    │ metadata)  │ │  cache)   │ │  history)   │
    └────────────┘ └───────────┘ └─────────────┘

Enter fullscreen mode Exit fullscreen mode

All services communicate asynchronously via Kafka for non-critical paths.


3. Key Components Explained

CDN + Edge PoPs

TikTok's secret weapon. ~70% of video traffic is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files (playlist URLs) are invalidated within seconds of a video going viral.

Upload Pipeline

Chunked multi-part upload (5 MB chunks) tolerates flaky mobile connections. Workers dedup via SHA-256 before writing. Transcode jobs run on GPU fleets — outputs include 360p, 720p, 1080p, and HEVC variants. Thumbnails and stills are extracted for ML feature generation.

Recommendation Engine

A two-tower neural network:

  • Tower 1 — encodes user state (watch history, device, time of day, location)
  • Tower 2 — encodes video features (visual embeddings, audio, caption text)

Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals (trending, friend activity) before the feed is assembled.

Feed Assembly (Pre-compute + Real-time Merge)

This is where TikTok differs from Twitter/Instagram:

  • Celebrity/high-follow accounts — fan-out on write (posts pushed to follower inboxes eagerly)
  • Regular accounts — fan-out on read (merged at request time)

The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a 300s TTL.

Kafka Message Bus

All write events (upload complete, like, follow, watch-complete) are published to Kafka topics. Downstream consumers include:

  • Analytics pipeline
  • Notification fan-out
  • ML feature store updater
  • Search indexer

Topics are partitioned by user_id for ordered processing per user. This decouples services and allows independent scaling.

Database Strategy

Store Use Case Why
MySQL / Vitess User profiles, video metadata, social graph ACID, sharded by user_id
Redis Cluster Counters (likes, views), session tokens, feed cache Sub-millisecond reads
Cassandra Watch history, timelines, notification logs Wide-row reads, high write throughput

4. Key Design Trade-offs

Fan-out on Write vs Read

The classic dilemma in social feed systems. TikTok uses a hybrid approach (the "celebrity problem" split):

Fan-out on write (for accounts with millions of followers):

  • Read path is O(1) — just read the inbox
  • Fast feed assembly at serving time
  • Massive write amplification when a celebrity posts

Fan-out on read (for regular users):

  • No write amplification on post
  • Storage-efficient
  • Slower feed assembly if following thousands of accounts

Eventual vs Strong Consistency

Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require strong consistency. TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths.

Push vs Pull for Notifications

Likes and comments use WebSocket push for real-time delivery. Less critical notifications (weekly summaries, suggested follows) use a pull-based batch pipeline that runs every few hours — no need to maintain a persistent connection for a weekly digest email.


5. Back-of-Envelope Estimates

Assumptions: 1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB (720p). 34M uploads/day ~= 400 uploads/sec peak.

Storage:

34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video
With 3x replication over 5 years = ~4.4 EB total raw storage

Enter fullscreen mode Exit fullscreen mode

Feed reads:

500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
With 95% Redis cache hit rate -> recommendation backend sees ~5,750 rps

Enter fullscreen mode Exit fullscreen mode

Bandwidth:

500M users x 45 min x 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress

Enter fullscreen mode Exit fullscreen mode

This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs.


6. What Makes TikTok's Architecture Special?

Most social platforms optimize for social graph traversal — show me what people I follow posted. TikTok inverted this: the algorithm is the product. The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals.

Three things stand out:

  1. Aggressive edge caching — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy.

  2. Real-time ML feedback loops — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers.

  3. Microservice isolation — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading.


Interview Tips

If you're using this for a system design interview:

  1. Start with requirements — always clarify scale before designing anything
  2. Estimate first — back-of-envelope math shows you understand the constraints
  3. Sketch the high-level diagram — then dive into the component your interviewer cares about
  4. Talk through trade-offs — interviewers want reasoning, not a list of technologies
  5. Bottleneck hunt — proactively identify where the system will break and how you'd fix it

Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.