惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Hugging Face - Blog
Hugging Face - Blog
Jina AI
Jina AI
宝玉的分享
宝玉的分享
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
人人都是产品经理
人人都是产品经理
博客园 - 聂微东
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
J
Java Code Geeks
博客园 - 【当耐特】
小众软件
小众软件
博客园 - Franky
S
SegmentFault 最新的问题
WordPress大学
WordPress大学
雷峰网
雷峰网
The Cloudflare Blog
酷 壳 – CoolShell
酷 壳 – CoolShell
量子位
Last Week in AI
Last Week in AI
博客园_首页
月光博客
月光博客
IT之家
IT之家
阮一峰的网络日志
阮一峰的网络日志
Webroot Blog
Webroot Blog
Stack Overflow Blog
Stack Overflow Blog
腾讯CDC
云风的 BLOG
云风的 BLOG
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
W
WeLiveSecurity
Recent Commits to openclaw:main
Recent Commits to openclaw:main
D
Docker
The Last Watchdog
The Last Watchdog
有赞技术团队
有赞技术团队
Hacker News - Newest:
Hacker News - Newest: "LLM"
D
DataBreaches.Net
S
Security @ Cisco Blogs
Blog — PlanetScale
Blog — PlanetScale
GbyAI
GbyAI
TaoSecurity Blog
TaoSecurity Blog
S
Security Affairs
Y
Y Combinator Blog
O
OpenAI News
罗磊的独立博客
MongoDB | Blog
MongoDB | Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
Forbes - Security
Forbes - Security
P
Palo Alto Networks Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
K
Kaspersky official blog
Cloudbric
Cloudbric

Charting the Unknown: An Observationalist’s Log

A crumb for breakfast This is my stick, there are none like it A Night Heron... In the Day Voyager and the Art of Graceful Degradation Hoopoe Celebrating as Hoopoes Do Ghostly Cranes in dawn's early light. Artemis and Apollo: The Systems That Took Them to the Moon — and Brought Them Home Problems Before the Real Problem: The First Lessons of Apollo 13 Excellence Is a Habit Back to Flight Black Kites on parade in an olive grove.
If Neil Armstrong Were Your Engineer, You Wouldn’t Need Alerts
Robert · 2026-05-12 · via Charting the Unknown: An Observationalist’s Log

 Apollo 11 didn’t lack insight — but it only succeeded when the right action was chosen. 

“That’s one small step for [a] man, one giant leap for mankind” — Most people are familiar with these words, but how many people know that Neil Armstrong was only seconds away from saying “We didn’t land on the Moon because the computer kept rebooting”?

Part of the Apollo 11 Spacecraft May Still Be Orbiting the Moon
Apollo 11’s Eagle Lunar Module, in flight

Minutes before the planned moment of the first landing, Apollo 11 was skimming the lunar surface at over 1,300 kilometers per hour. The astronauts Neil Armstrong and Buzz Aldrin were concentrating on flying their strange-looking lunar lander and making sure they weren’t running out of fuel. The last thing they needed was the computer flashing esoteric error messages on the LED screen.

But at a critical juncture, the primitive dashboard on their onboard computer started raising an alarm. Not a single alarm, many.

While the astronauts continued flying, the engineers at Houston Mission Control rushed to interpret the errors — the computer was rebooting mid-flight. And not once, but repeatedly.

The “high resolution” DSKY display the astronauts used to navigate. 
Currently showing the 1202 alarm.

1201!  1202!

With the astronauts low on fuel and perilously close to the surface of the moon, a decision needed to be made quickly. How to act? Use any of several abort options, change the flight plan, or continue flying with a misbehaving computer and risk a crash?

Fortunately, a rigorous training and operations regimen had prepared them for nearly every eventuality. The engineers had thick books, with descriptions of every possible behaviour or combination of reactions in the Apollo systems. Today, we’d call them Runbooks and use AI to search them. In 1969 they used a combination of index cards, documentation, and human memory to match the cryptic error code to the explanation. In a nutshell, the computer was saying “I’ve run out of resources, I’m rebooting and starting over!” again and again, every few seconds.

The 1201 and 1202 were “hidden” edge cases. They were documented in a manual but never expected during a live descent. There was no recovery script. It was the ultimate edge case occurring at the worst possible moment.

1201! 1202!

The astronauts tersely requested information and the flight controllers needed to find an answer quickly — could the Lunar Module be trusted?
So close to the Moon, the difference between landing successfully and crashing was very thin.

1202! 1201!

The flight computer’s main controller, Steve Bales, used an analytical tool at his disposal. The trigger to abort was not merely “the computer is behaving abnormally” but “the Lunar Module is going to crash”. Not merely an alert, but a risk materializing. He checked with his colleagues, each of whom was responsible for a different aspect of the spacecraft’s systems and collated from them what today we would call the most important Key Performance Indicators (KPIs) or “Golden Signals”.

Is the Lunar Module flying at the right speed, direction and angle? Is it in the right place, at the right time?
Can the astronauts control it? Can the engineers on the ground keep in contact with the spacecraft?
Does it have enough fuel to land?

While juggling the information, Bales used a secret weapon: a "cheat sheet" created during simulation training by another young engineer named Jack Garman. Together, they realized that as long as the alarms were intermittent and the "Golden Signals" of the mission — altitude, velocity, and fuel — remained nominal, the system was still succeeding. 

Bale’s “Cheat Sheet” with the cryptic error messages highlighted (NASA)

Despite the reboots continuing to flash alerts, all the answers to his questions were “Yes”.

The astronauts got the answer they needed — “You are go for landing!”.

By shouting "Go!" over the loop, Bales performed the ultimate act of modern observability: he filtered out the noise to focus on the outcome. He turned a potentially literal "Crash" into a historic success because he knew which signals mattered.

The rest, as they say, is history.

Fast forward to 2026.

We may not be landing on the Moon, but we’re dealing with the same problems. Our Golden Signals do not reflect our readiness to land on the Moon but the capability of our systems to serve our customers. We often define these critical Golden Signals as the Latency (how slowly the system is responding), Traffic (how many requests the system is getting), Saturation (how “full” the queues or containers are) and Errors (this one is rather self-evident).

However, we rarely have all these signals available at our fingertips. We need to collect, collate, aggregate, and interpret information throughout the stack of the observability, optimization, security and operations systems we use. Once we’ve extracted insights from our systems, we need to decide what to do with the information – choose and execute the actions which will resolve our problems and return our systems to proper behaviour.

NASA Mission Control – so many engineers, so many signals, so many insights & possible actions

Armstrong, Aldrin, Bales, and all the other engineers on call during that historic event made the right choice, turned their insights into the correct actions, based on their planning, experience, and expertise. They turned an emergency into “business as usual” on the way to the Moon.

But here is the reality of 2026: Your stack is millions of times more complex than the Apollo computers. Can you afford to wait for a 'Neil Armstrong' to intervene during an incident? 

Apollo had fewer signals but clearer decisions.
Modern systems generate more data — but not always better outcomes.

This is where modern solutions are needed. Solutions such as the newly announced IBM Concert platform come into play — think of Concert as your digital Mission Control.

Concert is constantly examining your environment and keeping up with changes. It correlates the various anomalies (whether logs, events, or metrics) in your system (from applications, Cloud solutions, physical infrastructure, or anywhere else) with the possible solutions (fully automated, human-in-the-loop, or manual) and recommends the next best action to take, at the right time, by the right person.

Concert generates insights, and translates them into actions – Who, What, When, How.

Now, if Neil Armstrong and Buzz Aldrin were in control of your operations or Steve Bales were your SRE, you might not need IBM Concert… but they’re not.

That means you need something just as critical: the ability to turn insight into action — instantly.

1960’s Earthrise by Apollo 8 and 2020’s Earthset by Artemis II

In 1969, Armstrong & Aldrin had Bales. Bales had Garman and a single sheet of paper.

In 2026, you have IBM Concert platform.

Apollo 11 proves that the real problem in IT operations isn’t visibility — it’s knowing what to do next.

The views expressed in this article are mine and do not necessarily represent the official position of my employer.