From Outage to Architecture Map with AI

We once had a service outage that quietly lasted for several days. Not because we had no monitoring. Not because QA did not test. But because one part of the system was not covered by any alert or test - and we were not aware soon enough. The first reaction was obvious: “Let’s add one more test.” The second question was scary: “How many other blind spots like this exist?” ...

March 1, 2026 · 6 min

Personal Status Page as an Incident Tool

I built a simple status page for a test tenant with test resolvers so I can see real-time health and a lightweight history of degradation events. It’s not a customer-facing product. It’s an incident tool for me: one place to answer “what’s broken, where, since when?” before anyone else reports it. It’s private and access-controlled, but it lives on a real server. I treat it like production anyway. ...

February 17, 2026 · 5 min

Get new posts by email

Subscribe