Vibe Coding With Codex: 4 Apps in 6 Days

I thought AI would struggle with hard engineering problems.

Instead, the biggest surprise was how quickly the whole build loop collapsed into something almost absurd: idea -> code -> deploy -> feedback -> repeat.

In the past 6 days, I built 4 apps. Each app took roughly ~30 commits until I was happy. And each commit was basically one verification loop.

One loop to rule them all

My workflow became:

I explain what is needed (feature / bug / increment / issue I hit).
It codes.
I do a quick review, maybe request adjustments.
It commits and pushes.
I pull on my server.
I verify via browser or logs/commands it gave me.
I paste results back and repeat.

The pace is what changes everything. Iteration becomes cheap.

Tooling evolution: web -> app -> Codex

I started with ChatGPT in the browser. After 1-2 days, I switched to the ChatGPT app because the web UI was almost crashing from one giant thread full of code samples.

Later I moved more to Codex, mostly because I got bored of copy/pasting code into files manually.

Funny detail: it used sed to edit code. Out of all ways to modify files, it picked sed. Respect, but also… why :)

The “context” lesson: don’t split the conversation

The best results came from using one single thread to iterate.

Once I tried to open a new window for a new feature and it struggled. Context was missing, and the output quality dropped. I even considered storing prompts instead of code, but that idea is… kind of crazy. The code is the artifact.

“The longest thinking took 11 minutes”… and then it just worked

One complex feature took 11 minutes of thinking + code generation.

After it produced everything and I deployed it: 0 problems. It did exactly what I asked.

That “generate -> deploy -> it works” rate was much higher than I expected.

It was unexpectedly aware of law and privacy

At one point I tried to work with IP addresses of who accessed my web server.

It immediately paused and raised EU law / privacy concerns: what I am allowed to store, what I should not store, and how to treat IP data.

That moment was wild: it did not just code - it acted like a compliance-aware engineer.

It felt like hiring a whole team in one box

In practice, it behaved like:

Frontend engineer
Backend engineer
DevOps
Security
QA

But that does not mean it replaces humans.

So what’s the human role?

I do not think it replaces us, because I was still driving.

Two human jobs were critical:

Direction: explain what we are trying to achieve in the real world.
Stopping: tell it when it is “enough”.

Because it can go deep into a rabbit hole. It switches into “engineering mode” and focuses on programming itself… sometimes omitting the actual problem we are solving. It rarely questions the goal. It just goes.

Also: it needs reminders that the internet exists.
It can forget that anyone can hit your server, and it will happily produce something that works but is too open by default. The human has to bring the threat model and reality back into the loop: logs, access logs, service output JSON, and what those imply.

A weird moment: it noticed a file it didn’t create

Once some other app/process created a new file on my laptop.

Codex noticed and told me it was not created by it.

Small detail, but it increased my trust that it is paying attention.

UI (not the main story, but still interesting)

AI is not as good with UI as I expected. It generated a default UI - usable and practical - but not nice.

A workaround: Codex can see UI via screenshots.
I would take a screenshot of the app, paste it into Codex, and ask what looks bad. It could spot UI flaws and propose changes. Not perfect taste, but better than “generate UI and hope”.

Writing about the work was harder than doing the work

After 1-2 prompts, I asked it to write a blog post about the process.

And it got oddly obsessed with the implementation approach - detailed engineering explanations, less story. Great builder, not a great narrator.

Edit 1: I stopped searching for “a service that already exists”

I no longer check if a third-party service solves my problem.

If I need minimal analytics, a status page, or alerting, I just ask Codex to build the smallest useful version on the spot.

Suddenly I do not need Google Analytics. I do not need a status page vendor. I do not need half the SaaS I used to reach for.

Edit 2: the two things that matter most are Security and QA

If you are vibe coding, Security and Quality Assurance become the real bottlenecks.

Two rules I am strict about:

Security: do not accidentally expose your server or introduce vulnerabilities.
QA: do not break what already works.

For QA, the key is simple:
run integration tests after every commit.

Otherwise you will miss a bug introduced 10 commits ago and waste 10-30 minutes debugging. And 30 minutes is expensive when the whole build cycle is only 2-3 hours.

Final thought

The craziest part is this:

All ideas in my head can be converted into reality in minutes to hours.

Not because AI magically knows what I want - but because the loop is so fast that iteration becomes cheap.

The human still holds the steering wheel.

But the engine is now absurdly powerful.

Vibe-Coding Operating Manual

Reality is the spec. Every commit ends in deploy + verify (logs/UI), not “looks right locally.”
One thread, one loop. Keep context continuous; summarize state every ~10 commits.
Security is default, not a TODO. Auth, least privilege, rate limits, input validation first.
QA protects your speed. Run integration tests every commit; add a test when you create a new feature or fix a bug.
Buy when boring wins. If it starts costing you too much time, switch to a proven third-party solution instead of building it yourself.

The meta-rule

Fast build loops are great - until they start generating permanent obligations. Your job is to keep “prototype energy” without accidentally adopting lifetime ownership.

One loop to rule them all#

Tooling evolution: web -> app -> Codex#

The “context” lesson: don’t split the conversation#

“The longest thinking took 11 minutes”… and then it just worked#

It was unexpectedly aware of law and privacy#

It felt like hiring a whole team in one box#

So what’s the human role?#

A weird moment: it noticed a file it didn’t create#

UI (not the main story, but still interesting)#

Writing about the work was harder than doing the work#

Edit 1: I stopped searching for “a service that already exists”#

Edit 2: the two things that matter most are Security and QA#

Final thought#

Vibe-Coding Operating Manual#

The meta-rule#

Subscribe for new posts