惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
V
V2EX
博客园 - 三生石上(FineUI控件)
Martin Fowler
Martin Fowler
WordPress大学
WordPress大学
D
Docker
S
SegmentFault 最新的问题
博客园 - 聂微东
美团技术团队
Apple Machine Learning Research
Apple Machine Learning Research
月光博客
月光博客
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Last Week in AI
Last Week in AI
M
MIT News - Artificial intelligence
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
The GitHub Blog
The GitHub Blog
GbyAI
GbyAI
L
LangChain Blog
Vercel News
Vercel News
博客园 - 叶小钗
MongoDB | Blog
MongoDB | Blog
Stack Overflow Blog
Stack Overflow Blog
H
Help Net Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The Cloudflare Blog
Engineering at Meta
Engineering at Meta
T
Threat Research - Cisco Blogs
T
Threatpost
Scott Helme
Scott Helme
T
Tailwind CSS Blog
Latest news
Latest news
Stack Overflow Blog
Stack Overflow Blog
Blog — PlanetScale
Blog — PlanetScale
The Register - Security
The Register - Security
罗磊的独立博客
P
Proofpoint News Feed
腾讯CDC
S
Schneier on Security
雷峰网
雷峰网
A
About on SuperTechFans
T
Tenable Blog
F
Full Disclosure
Cyberwarzone
Cyberwarzone
博客园_首页
有赞技术团队
有赞技术团队
K
Kaspersky official blog

文章列表

Compulsive curiosity, or, how I built an infinite idea machine Gift details on the subscriber portal Portal link in the archive nav The physicists who convinced Fermilab to send Brazil's emails First, add no friction: How micropayments lost and subscriptions won Filter subscribers and automations by source Automations, rebuilt What email will look like in the future Filter subscribers by bounce date and reason Email could have been X.400 times better Three features are moving behind the paywall Firewall changes and improvements Put your name and voice into your company newsletter Simplified email address settings Subscription wall Inboxes were overwhelming before we'd even named them The US government tried really hard to screw up email Public postmortem: database connection exhaustion Ask a nerd: what is the best way to unsubscribe from newsletters? Bookshop.org embeds Email was into agents before they were cool Passwordless login Rename metadata keys in bulk A spring cleaning for our legal docs Ask a nerd: what happens when you click the spam button? Passkey support for two-factor authentication How Buttondown's API versioning works Safer defaults for the email creation API How to send email to space How we enabled Content Security Policy for everyone Recovery codes for two-factor authentication Filter sent emails by engagement rate How we migrated to TypeIDs without breaking clients How we check every link in your email Use newsletter metadata in your emails Should we bring back email exploders? Sort and filter by open and click rates Custom click tracking domains More newsletter settings in the API Revamped replies Custom email templates for everyone Simplified cancellation Ask a Nerd: Does email length affect deliverability? The changelog, reborn Swedish localization Forwarding an email is not always straightforward Public descriptions for tags OpenAPI spec for archives How Rodrigo brings a humanistic view to consumer technology Subscribers can come from anywhere. Even another newsletter platform's form. Survey responses on the web How Brandon Lucas Green shares his music and supports artists Your newsletter's archives are more valuable than your list Better tag self-management Smarter automation filters Granular API keys Snippets New design settings pages Ask A Nerd: How does newsletter cadence affect deliverability? Starred views More ways to customize your archives Inbox filtering Mastodon follower analytics Ask a Nerd: What are good open, click, and response rates for an email newsletter? How we migrated our database to PlanetScale Two new archive themes Custom buttons now work in Markdown mode Ask a Nerd: Does attaching files to your newsletter hurt deliverability? Seline and Tinylytics support Unban subscribers Announcement bars for your archives Bang paths, source routing, and how email trips were planned Public postmortem: archive downtime 2025 disposables.app Russian localization Ask a Nerd: Can you improve email deliverability with a personal domain? More locale options How we interview customers at Buttondown Bluesky analytics Reply to conversations Minimum viable complexity How Jeffery Hicks goes behind-the-scenes in his newsletter Changes to our stack in 2025 2026: Emails What the hell is a UTM? TK reminders in the editor Randomize survey answer order Why we insourced analytics Scroll sync in the editor 2026: Archives How Jamie Thingelstad uses Buttondown to explore tech topics How Kelly Jensen uses Buttondown to discuss key library issues Keeping feature creep at bay Improved filters Content Security Policy in archives Open source Sniperl.ink Auto-activating RSS reader subscriptions What the hell is ActivityPub? How Igor Ranc built Berlin's largest expat tech newsletter
Public postmortem: external events backlog
Justin Duke · 2025-03-27 · via

A misconfiguration on one of our self-hosted SMTP servers led to a crash that was difficult to recover from, causing many emails to become “stuck” in the system. The effects varied—some emails were delayed, some were sent multiple times, and some never went out at all. We have since corrected the configuration, are actively investing in improved tooling and alerting, and are building safeguards to prevent this kind of situation going forward.

What happened?

Buttondown uses various providers to deliver emails from authors to subscribers. Alongside third-party vendors, we also run our own fleet of servers for this purpose—referred to here as postal servers, in reference to the open source project we rely on.

On a recent Wednesday morning, we received automated alerts from our monitoring system indicating an unusual backlog of emails. Further investigation showed that one postal server was taking an excessive amount of time to send each message, eventually reaching a point where it did little but time out repeatedly. Logging into the server, we identified the culprit: a database handling pending messages was malfunctioning.

While we initially suspected overall message volume as the problem, we discovered the real issue was excessive connection attempts to the database from too many worker threads. The database was not configured to recover cleanly from this, nor to properly alert downstream connections about the situation. Our immediate solution was straightforward: we rebooted the database, scaled down worker count, and restored connections to a manageable state.

This left us with a challenging recovery: about 70,000 messages were stuck in limbo. Some were marked as pending but in reality had been sent, others were erroneously marked as sent, and so on.

Essentially, we entered a state where we couldn’t trust our sources of truth. Our standard operating procedure in such cases is to act conservatively. This meant isolating the affected server, shutting down its workers, leaving its messages as pending, and shifting traffic elsewhere—ensuring we didn’t worsen the situation or make decisions based on unreliable information.

This is what we did: traffic was shifted, the backlog queue was drained, and we resent only those emails we were certain had not gone out. Once the problematic server was cleared, we returned it to service.

How are we fixing it?

You might be wondering what we’re doing to improve things. The first step—already in progress as of this post—is to implement much more rigorous monitoring and alerting. Previously, we relied on broad integration-level metrics, which suffice for well-defined, obvious problems, but not for more nuanced or structural issues.

To be specific: we already had per-server alerts for pending or stuck messages, but these relied on an active database connection—which we didn’t have during this incident.

The broader effort is to give authors better visibility into delivery patterns. One of the worst experiences as an author is seeing an email marked as sent, but never receiving it. We intend to be more transparent about email states, so you can look into the dashboard and understand if delays or problems are happening.

Customer impact

During this incident, approximately 13,000 subscribers across 40 authors were affected. They experienced:

  • Delays of multiple hours before receiving messages
  • Not receiving emails at all (although we have since retried these deliveries)
  • Receiving duplicate emails

Looking ahead

Frankly, we’ve experienced too many incidents recently.

We’ve spent the last six months fixing bugs and improving stability at a granular level, but haven’t invested enough in end-to-end, infrastructure-wide reliability. These last few weeks have emphasized the need for this. Our primary responsibility is to reliably deliver your writing, and we’re now dedicating significant resources over the next six months to improve our observability, diagnostics, and resolution capabilities. If you’ve read this far, it’s probably out of frustration, not just curiosity—we know you’ve entrusted us with your work, and when we fall short, we take it seriously. We’re addressing this with urgency and commitment.