We once had a service outage that quietly lasted for several days.
Not because we had no monitoring.
Not because QA did not test.
But because one part of the system was not covered by any alert or test - and we were not aware soon enough.
The first reaction was obvious: “Let’s add one more test.”
The second question was scary: “How many other blind spots like this exist?”
To answer that, we needed something painfully basic… and surprisingly hard:
A complete, shared understanding of what the product actually contains.
This is a perfect example of a number-of-sources heavy task - the kind that feels impossible because the truth is scattered across people, documents, and tribal knowledge.
Here is how I chunked it into something doable with AI.
The real problem: we did not have a product map
Most of our Product Management team joined after many parts were already built. If you want to understand the full product, you can do it the traditional way:
- interview senior engineers
- collect notes
- reconcile contradictions
- repeat for weeks
Eventually, you will get the answer to the “ultimate question”:
- what is our product?
- what modules does it consist of?
- what functionality exists?
But that approach is slow, inconsistent, and hard to verify.
So I looked for a better starting point.
The best “single source” we already had: Service Specification
Then I realized we already have a document that tries to describe the whole thing: Service Specification.
It is the doc we share with customers during offering - one place that describes:
- what product they are buying
- which modules exist
- what functionality is available to business customers and end users
Is it perfect? Not always. It can be incomplete or not fully updated.
But for our goal - cover what we explicitly promise to customers and their users - it was the best internal material I could find.
So the new plan became:
Map product functionality from Service Specification -> then use the map to drive monitoring/alerting/tests coverage.
The method: Chunk -> Verify -> Scale
I used the same technique I have used for other “huge tasks”:
1) Chunk: start small, define boundaries
I took our high-level product structure: ~11 modules we already defined.
Important: these modules must be comprehensive - together they cover the whole product.
Then I started mapping module-by-module to the Service Specification.
2) Verify: make output reviewable (not “chat-only”)
Before processing content, I did one preparation step that changed everything:
- I asked Codex to store output into a CSV
- each row had a unique ID
- the structure was consistent so we could reference specific items precisely
Why this matters:
- you can filter, sort, dedupe
- you can review systematically
- you can say: “Explain row 183” instead of “that thing you mentioned earlier”
- you can collaborate with humans (QA/Eng) without losing context
This is the difference between chatting and building.
3) Scale: repeat the same pipeline across modules
Once the process worked for one module, I repeated it for the rest.
Get the right humans onboard early (or your work dies later)
Before I went deep, I wanted to make sure the output would actually be used.
QA was crucial for this exercise because they will be the ones who:
- follow up on gaps
- create tests
- ensure tests run at the right moments
So I got QA onboard early. That made the mapping a practical asset, not a “PM spreadsheet artifact”.
Processing the first module: Admin Console
I started with Admin Console because I had the most hands-on experience with it.
Workflow:
- I gave Codex the chapter list of the Service Specification (names only, no content).
This helped it understand the terrain. - Codex created a high-level draft in the CSV.
- I copy-pasted the full Admin Console section and asked it to extract functionality into the structure.
The first output was already “heavy” and it immediately raised important questions:
- Is this truly product functionality or more like a service we provide?
- Is the granularity right?
- Are there duplicates or “umbrella” items?
We did a few refinement rounds. I was not fully happy yet - but I continued to the next module to see if the pattern holds.
Reality: modules are not clean (dependencies matter)
A key complication showed up quickly:
Some things “live” in one module’s UI but belong to another module conceptually.
Example:
- alerting/reporting settings might appear in Admin Console
- but we treat Alerting & Reporting as a standalone module with its own PM
So the map needed to capture cross-module relationships, otherwise it would not be usable for implementation.
The most expensive phase: Quality verification (~50% of the time)
Extraction was fast.
Verification took most of the effort.
Instead of manually checking ~300 items, I asked Codex to run a quality check with a simple instruction:
“Do a quality check and tell me where you need a human answer or verification.”
It was surprisingly effective. It flagged issues like:
1) Duplicity / “umbrella” items
Examples:
- “Admin Console” under “Admin Console”
- “Processing” and “Processing of alerts” both under Alerting & Reporting
Codex spotted them and helped remove/merge them.
2) Service vs functionality categorization
Sometimes it suggested the right correction. Sometimes I decided it is fine as-is.
3) Visual sanity check beats raw review
With ~300 items, reviewing line-by-line is painful.
The best review was later:
- build a dashboard / chart
- see the map visually
- spot “this looks wrong” patterns quickly
The final structure I ended up with
After ~300 functionalities across ~10 modules, I added one intermediate layer because the jump from module -> functionality was too steep.
Final structure:
- product - one named product we sell
- module - top-level split (~10)
- submodule - second split (~30 areas)
- functionality - granular level (~300)
And practical fields for execution:
related_to_module
Example: user config UI is in Admin Console, but the feature belongs to Alerting & Reporting.
This becomes a dependency map for engineering and QA.functionality_for
Who experiences it: business customer vs end user
Important because coverage must validate both:- customer changes something
- end user actually sees/experiences the change
offering_type
Is it product functionality or a service? (product vs services)
Services typically cannot be covered the same way by monitoring/alerts/tests.
The mistake I made (so you do not repeat it)
I did not evaluate output quality early enough.
I should have done deeper review on the first small chunk and “trained” the model with feedback before scaling to everything.
Instead, I had to do quality cleanup on the full dataset later.
Rule:
Review small -> lock the pattern -> then scale.
Not scale -> then rework everything.
The biggest takeaway: structure beats chat
I spent ~2-3 hours on this exercise - and the results were worth it.
But the biggest lesson was not “AI is smart.”
It was this:
AI becomes dramatically more useful when you force it into a structure you can inspect.
If you only use chat:
- you cannot see the data clearly
- you cannot verify systematically
- you cannot collaborate
- you get lost
When you use a shared artifact (CSV/IDs/dashboard/canvas):
- you iterate like engineers do
- you can verify and correct fast
- you can scale safely
For number-of-sources heavy tasks, do not use AI as a talk partner. Use it as a structured-work copilot.