From Outage to Architecture Map with AI

We once had a service outage that quietly lasted for several days.

Not because we had no monitoring.
Not because QA did not test.
But because one part of the system was not covered by any alert or test - and we were not aware soon enough.

The first reaction was obvious: “Let’s add one more test.”
The second question was scary: “How many other blind spots like this exist?”

To answer that, we needed something painfully basic… and surprisingly hard:

A complete, shared understanding of what the product actually contains.

This is a perfect example of a number-of-sources heavy task - the kind that feels impossible because the truth is scattered across people, documents, and tribal knowledge.

Here is how I chunked it into something doable with AI.

The real problem: we did not have a product map

Most of our Product Management team joined after many parts were already built. If you want to understand the full product, you can do it the traditional way:

interview senior engineers
collect notes
reconcile contradictions
repeat for weeks

Eventually, you will get the answer to the “ultimate question”:

what is our product?
what modules does it consist of?
what functionality exists?

But that approach is slow, inconsistent, and hard to verify.

So I looked for a better starting point.

The best “single source” we already had: Service Specification

Then I realized we already have a document that tries to describe the whole thing: Service Specification.

It is the doc we share with customers during offering - one place that describes:

what product they are buying
which modules exist
what functionality is available to business customers and end users

Is it perfect? Not always. It can be incomplete or not fully updated.

But for our goal - cover what we explicitly promise to customers and their users - it was the best internal material I could find.

So the new plan became:

Map product functionality from Service Specification -> then use the map to drive monitoring/alerting/tests coverage.

The method: Chunk -> Verify -> Scale

I used the same technique I have used for other “huge tasks”:

1) Chunk: start small, define boundaries

I took our high-level product structure: ~11 modules we already defined.
Important: these modules must be comprehensive - together they cover the whole product.

Then I started mapping module-by-module to the Service Specification.

2) Verify: make output reviewable (not “chat-only”)

Before processing content, I did one preparation step that changed everything:

I asked Codex to store output into a CSV
each row had a unique ID
the structure was consistent so we could reference specific items precisely

Why this matters:

you can filter, sort, dedupe
you can review systematically
you can say: “Explain row 183” instead of “that thing you mentioned earlier”
you can collaborate with humans (QA/Eng) without losing context

This is the difference between chatting and building.

3) Scale: repeat the same pipeline across modules

Once the process worked for one module, I repeated it for the rest.

Get the right humans onboard early (or your work dies later)

Before I went deep, I wanted to make sure the output would actually be used.

QA was crucial for this exercise because they will be the ones who:

follow up on gaps
create tests
ensure tests run at the right moments

So I got QA onboard early. That made the mapping a practical asset, not a “PM spreadsheet artifact”.

Processing the first module: Admin Console

I started with Admin Console because I had the most hands-on experience with it.

Workflow:

I gave Codex the chapter list of the Service Specification (names only, no content).
This helped it understand the terrain.
Codex created a high-level draft in the CSV.
I copy-pasted the full Admin Console section and asked it to extract functionality into the structure.

The first output was already “heavy” and it immediately raised important questions:

Is this truly product functionality or more like a service we provide?
Is the granularity right?
Are there duplicates or “umbrella” items?

We did a few refinement rounds. I was not fully happy yet - but I continued to the next module to see if the pattern holds.

Reality: modules are not clean (dependencies matter)

A key complication showed up quickly:

Some things “live” in one module’s UI but belong to another module conceptually.

Example:

alerting/reporting settings might appear in Admin Console
but we treat Alerting & Reporting as a standalone module with its own PM

So the map needed to capture cross-module relationships, otherwise it would not be usable for implementation.

The most expensive phase: Quality verification (~50% of the time)

Extraction was fast.
Verification took most of the effort.

Instead of manually checking ~300 items, I asked Codex to run a quality check with a simple instruction:

“Do a quality check and tell me where you need a human answer or verification.”

It was surprisingly effective. It flagged issues like:

1) Duplicity / “umbrella” items

Examples:

“Admin Console” under “Admin Console”
“Processing” and “Processing of alerts” both under Alerting & Reporting

Codex spotted them and helped remove/merge them.

2) Service vs functionality categorization

Sometimes it suggested the right correction. Sometimes I decided it is fine as-is.

3) Visual sanity check beats raw review

With ~300 items, reviewing line-by-line is painful.

The best review was later:

build a dashboard / chart
see the map visually
spot “this looks wrong” patterns quickly

The final structure I ended up with

After ~300 functionalities across ~10 modules, I added one intermediate layer because the jump from module -> functionality was too steep.

Final structure:

product - one named product we sell
module - top-level split (~10)
submodule - second split (~30 areas)
functionality - granular level (~300)

And practical fields for execution:

related_to_module
Example: user config UI is in Admin Console, but the feature belongs to Alerting & Reporting.
This becomes a dependency map for engineering and QA.
functionality_for
Who experiences it: business customer vs end user
Important because coverage must validate both:
- customer changes something
- end user actually sees/experiences the change
offering_type
Is it product functionality or a service? (product vs services)
Services typically cannot be covered the same way by monitoring/alerts/tests.

The mistake I made (so you do not repeat it)

I did not evaluate output quality early enough.

I should have done deeper review on the first small chunk and “trained” the model with feedback before scaling to everything.

Instead, I had to do quality cleanup on the full dataset later.

Rule:

Review small -> lock the pattern -> then scale.
Not scale -> then rework everything.

The biggest takeaway: structure beats chat

I spent ~2-3 hours on this exercise - and the results were worth it.

But the biggest lesson was not “AI is smart.”
It was this:

AI becomes dramatically more useful when you force it into a structure you can inspect.

If you only use chat:

you cannot see the data clearly
you cannot verify systematically
you cannot collaborate
you get lost

When you use a shared artifact (CSV/IDs/dashboard/canvas):

you iterate like engineers do
you can verify and correct fast
you can scale safely

For number-of-sources heavy tasks, do not use AI as a talk partner. Use it as a structured-work copilot.

The real problem: we did not have a product map#

The best “single source” we already had: Service Specification#

The method: Chunk -> Verify -> Scale#

1) Chunk: start small, define boundaries#

2) Verify: make output reviewable (not “chat-only”)#

3) Scale: repeat the same pipeline across modules#

Get the right humans onboard early (or your work dies later)#

Processing the first module: Admin Console#

Reality: modules are not clean (dependencies matter)#

The most expensive phase: Quality verification (~50% of the time)#

1) Duplicity / “umbrella” items#

2) Service vs functionality categorization#

3) Visual sanity check beats raw review#

The final structure I ended up with#

The mistake I made (so you do not repeat it)#

The biggest takeaway: structure beats chat#

Subscribe for new posts