惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

爱范儿
爱范儿
Know Your Adversary
Know Your Adversary
Google DeepMind News
Google DeepMind News
A
Arctic Wolf
P
Privacy & Cybersecurity Law Blog
云风的 BLOG
云风的 BLOG
Stack Overflow Blog
Stack Overflow Blog
V
Visual Studio Blog
Project Zero
Project Zero
L
LangChain Blog
N
News and Events Feed by Topic
博客园 - Franky
Last Week in AI
Last Week in AI
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Scott Helme
Scott Helme
T
The Exploit Database - CXSecurity.com
P
Proofpoint News Feed
Blog — PlanetScale
Blog — PlanetScale
www.infosecurity-magazine.com
www.infosecurity-magazine.com
W
WeLiveSecurity
月光博客
月光博客
博客园_首页
美团技术团队
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
腾讯CDC
Latest news
Latest news
WordPress大学
WordPress大学
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Spread Privacy
Spread Privacy
Attack and Defense Labs
Attack and Defense Labs
量子位
L
LINUX DO - 热门话题
C
CERT Recently Published Vulnerability Notes
Webroot Blog
Webroot Blog
L
Lohrmann on Cybersecurity
aimingoo的专栏
aimingoo的专栏
T
Troy Hunt's Blog
Security Latest
Security Latest
小众软件
小众软件
Cloudbric
Cloudbric
Hacker News: Ask HN
Hacker News: Ask HN
S
Secure Thoughts
雷峰网
雷峰网
T
Threat Research - Cisco Blogs
H
Hacker News: Front Page
IT之家
IT之家
Simon Willison's Weblog
Simon Willison's Weblog

PostHog's RSS Feed

Training our own AI models - PostHog The best analytics stack for vibe-coded apps The do's and don'ts of minimum viable product marketing - PostHog The best MCP servers for startups, by workflow 4,063 errors closed without a human opening PostHog – here's what we learned - PostHog PostHog Code and the self-driving product - PostHog Why attacking your competitors online is dumb - PostHog The best real-time analytics platforms for developers, compared DuckDB vs ClickHouse: Why we use both at PostHog - PostHog PostHog's next chapter - PostHog Making Claude Cowork actually useful - PostHog PostHog vs Matomo in-depth tool comparison You're doing lifecycle emails wrong Untangling Tokio and Rayon in production: From 2s latency spikes to 94ms flat The best HIPAA-compliant A/B testing tools - PostHog A beginner's guide to testing AI agents - PostHog I hate the standup bot (so I built an agent to do it for me) - PostHog The best CDPs for developers, compared The best error tracking tools for developers, compared The best feature flag software for developers, compared 7 best session replay tools for mobile apps 7 best free open source business intelligence tools right now 7 best free and open source LLM observability tools PostHog vs LogRocket in-depth tool comparison The most popular PostHog alternatives, compared Open source (and self-hosted) session replay tools - PostHog The 9 best GA4 alternatives for apps and websites - PostHog PostHog vs Google Analytics 4 in-depth tool comparison How we built automatic clustering for LLM traces - PostHog The 7 best HIPAA-compliant analytics tools 8 best open source analytics tools you can self-host - PostHog The best product analytics tools for startups, compared PostHog vs FullStory in-depth tool comparison The best in-app survey tools for product teams, compared The 7 best mobile app analytics tools PostHog vs Hotjar in-depth tool comparison The 8 best free and open-source feature flag services - PostHog The 5 best free and open-source A/B testing tools - PostHog The best mobile app A/B testing tools, compared What is a feature flag? Feature Flags vs Remote Config vs A/B Testing PostHog is now available in Vercel’s v0 The best Heap alternatives & competitors, compared PostHog vs Heap in-depth tool comparison PostHog vs Pendo in-depth tool comparison PostHog × Vercel: feature flags, minus the plumbing Your logs' final destination is in GA. You always end up here anyway Behind the scenes of a PostHog hackathon - PostHog The most popular Mixpanel alternatives & competitors, compared PostHog vs Mixpanel in-depth tool comparison The 9 best GDPR-compliant analytics tools How we use Logs at PostHog The best web analytics tools for developers, compared Stop AI slop: Run evals with LLM-as-a-Judge - PostHog You product data just got a job: Workflows is now out App onboarding: How to fix drop-off points Meet Logs (beta) – logs with all the tools you’re already using Why small teams crush tiger teams How we built user behavior analysis with multi-modal LLMs (in 5 not-so-easy steps) - PostHog The best Contentsquare alternatives & competitors, compared 8 learnings from 1 year of agents – PostHog AI - PostHog Why we killed our AI product assistant Workflows graduate to beta! Product data, meet automation The best Rollbar alternatives & competitors, compared Workflows are now in Alpha and I already broke mine - PostHog I've consistently underestimated how important communication is as a CEO - PostHog How we made feature flags even faster and more reliable The best session replay tools for developers, compared What I learned attending my first ever hackathon - PostHog Did you know AI is answering our community questions? - PostHog How not to be boring - PostHog We built an internal tool to generate changelog images for social media - PostHog What we built at our windswept Mykonos hackathon - PostHog How we built our onboarding email flow (with actual performance data) - PostHog We're building a better PostHog community by closing our public Slack - PostHog Introducing Notebooks for PostHog - PostHog Why we've launched PostHog user surveys - PostHog How we made feature flags faster and more reliable - PostHog In-depth: ClickHouse vs Redshift - PostHog Introducing HouseWatch: An open-source toolkit for ClickHouse - PostHog Introducing HogQL: Direct SQL access for PostHog - PostHog What we built at our sun-kissed Aruba hackathon - PostHog In-depth: ClickHouse vs BigQuery - PostHog In-depth: ClickHouse vs Elasticsearch - PostHog HogMail #22: Why do companies over-hire?" - PostHog Our simpler goal: Help engineers to be better at product - PostHog In-depth: ClickHouse vs Snowflake - PostHog HogMail #21: Avoiding the "Product Death Cycle" - PostHog Sunsetting Kubernetes support for PostHog - PostHog Why 'Product Engineer' is the most fun role I've had in tech - PostHog HogMail #20: Why do startups fail? - PostHog The best Google Optimize alternatives for apps and websites - PostHog Array 1.43.0: Massive performance improvements! - PostHog In-depth: ClickHouse vs Druid - PostHog HogMail #19: Which meetings should you kill? - PostHog CEO diary: The things I learned in 2022 - PostHog The essential tools used by product engineers - PostHog HogMail #18: What can SaaS learn from the New York Times? - PostHog What is a product engineer? - Product Engineer Handbook - PostHog Array 1.42.0: Get beta features via our roadmap! - PostHog HogMail #17: The personal traits that can't be taught - PostHog
From 270GB RAM to 5GB: Moving local flag evaluation from Django to Rust
Patricio Tar · 2026-05-25 · via PostHog's RSS Feed

I reloaded Grafana three times before I trusted the numbers. p50 latency: 40ms to 4ms. CPU usage: a fraction of before. Memory: barely there. We'd just moved the feature flags local evaluation endpoint from Django to Rust. I knew it would be better. But I wasn't ready for this.

Local evaluation is a configurable endpoint that server-side SDKs hit to fetch feature flag definitions. SDKs poll it every 30 seconds by default, so it gets a lot of traffic. Until recently it had its own Django deployment, sized for that traffic: about 30 pods on average in US, with autoscaling configured up to 250. Each pod requested 2 CPU cores and 9 GB of memory, so at baseline we were running 60 cores and 270 GB of RAM.

The endpoint reads cached flag definitions out of Redis, checks auth, and returns JSON. There is no flag evaluation logic, and no database queries on the hot path. The Rust feature flags service next door was already serving around 13,000 requests per second on /flags and /decide, doing the actual compute, so moving the cache read over to live in the same codebase felt overdue.

Porting the endpoint itself was maybe the easy part, because most of the time went into matching Django's behavior around the edges.

Auth was trickiest. The endpoint's Django view declared one authentication method, but a shared mixin added more behind the scenes. I noticed when Rust rejected requests that Django accepted. A related quirk: Django returned 403 (authenticated but not permitted) for some requests where Rust returned 401 (not authenticated). Same outcome for the client, but the codes mean different things, and our dashboards disagreed about what was happening. Matching Django's behavior took a few small changes on the Rust side.

ETags were less painful. Instead of computing our own on the Rust side (which would have required byte-identical serialization with Django), Rust reads Django's stored ETag from Redis. That side-steps the serialization problem while the two services coexist. 54% of requests return 304 Not Modified. Over half of all SDK polls move zero bytes of flag data.

Billing and rate limiting were the boring half: same Redis counters, same allowlist, same quota response codes, just reimplemented on the Rust side so we could match Django's outputs request-for-request.

SDK polling is bursty and predictable. Flag evaluation is spiky and latency-sensitive. Putting both kinds of workload on the same pods felt wrong, so we split the Rust service into two fleets using a SERVICE_MODE environment variable. One fleet runs in flags mode and serves /flags and /decide. The other runs in definitions mode and serves /flags/definitions. Both share the same secret and the same codebase, with the mode just controlling which routes get registered.

The rollout went in three stages. We pointed our own internal SDKs at the Rust endpoint first, diffed responses with curl, then used Contour's weighted routing to shift external traffic: 10%, then 50%, then 100% over two days. Then we updated all seven server-side SDKs to poll /flags/definitions directly, one PR per SDK, with the old URL still working through a route alias for older customers.

Three small bugs surfaced during the weighted rollout:

  1. A missing token parameter that Django had been accepting silently.
  2. The 401-vs-403 mismatch from the auth quirk above, which I caught by lining up the two services' status code rates side by side in Grafana.
  3. A bug in the mixed targeting beta: the Python SDK was reading aggregation_group_type_index only at the flag level, so flags that mixed user and group conditions were treated as person-only. Group conditions failed locally and fell back to a server-side call. We fixed it in the Python SDK and left the others for follow-up.

The Rust definitions fleet now handles around 282 requests per second in US on 5 pods. Each pod requests 500m CPU and a bit under 1 GB of memory – 2.5 cores and around 5 GB total, against the 60 cores and 270 GB we used to spend. That's a 24x reduction in CPU and 56x in memory.

Django (before)Rust (after)
Pods (US, average)~30 (peak 43)5
CPU request per pod2,000m500m
Memory request per pod9,000 MB954 MB
Total CPU (at avg scale)60 cores2.5 cores
Total memory (at avg scale)270 GB4.8 GB
PGBouncer sidecarsYesNo
Dedicated node poolYes (memory-optimized)No (shared pool)

Latency improved too. These are Envoy-layer measurements: pod response time, no client network. A customer's SDK sees this plus its own round-trip. Same Prometheus metric (envoy_cluster_upstream_rq_time) for both services, so it's apples to apples.

DjangoRustImprovement
p5040 ms4 ms10x faster
p9595 ms20 ms4.7x faster
p99170 ms37 ms4.6x faster

The cache hit rate sits at 99.98%, which means almost every request is served from Redis with zero Postgres on the hot path. The dedicated Karpenter pool of memory-optimized instances is gone.

I keep coming back to the rollout. We could shift 10% of traffic, sit with the metrics for a day, and roll back instantly if something looked off. That's what made this safe to ship fast. Cut over in one step and customer support tickets would have done the testing. I'd have been firefighting instead of writing this.

If you want to read more about this topic, check out our blog on Untangling Tokio and Rayon in production which covers an earlier optimization on the same service, or How we made feature flags even faster and more reliable for the original migration that this one builds on.