The Autonomous Build Loop Playbook: 8 Lessons From Letting AI Build an Enterprise App
85 tasks. 16 hours. Zero human intervention. Then reality hit.
We recently ran an experiment: let Claude Code autonomously build an enterprise Contract Lifecycle Management (CLM) app from scratch. NestJS backend, Next.js frontend, Prisma ORM, multi-tenancy, RBAC, the full enterprise stack. 85 engineering tasks executed in sequence, no human in the loop. The build succeeded. TypeScript compiled. Tests passed. Then we tried to use it, and found 27 integration bugs in 4.5 hours. This post isn't the war story (that's the link above). This is the distilled playbook — the 8 architectural lessons and the checklist I wish I'd had before we started. If you're considering an autonomous AI build for anything beyond a weekend prototype, bookmark this.Lesson 1: The app/ vs src/app/ Directory Trap
What happened: Our Next.js project ended up with TWO sets of pages:
packages/web/app/(dashboard)/...— the one Next.js actually servespackages/web/src/app/(dashboard)/...— dead code, never served- Before any frontend fix, verify which directory Next.js serves:
ls packages/web/app/ packages/web/src/app/— if both exist, you have a problem - Add to your agent's context: "Frontend pages are in
packages/web/app/, NOTpackages/web/src/app/" - In task prompts, always specify the exact file path, not just the route
- Better yet: delete
src/app/entirely if it's not supposed to exist - Budget 30-40% of total time for integration testing. 85 tasks took ~3 hours to build, then 4.5 hours to fix integration bugs.
- After every frontend page task, add a follow-up: "Verify this page renders with real API data in a browser"
- Write API response shapes into a shared contract (
API_CONTRACTS.md) that both backend and frontend tasks reference - Never trust speculative type definitions — always
curlthe actual endpoint and compare - Standardize: the interceptor should NEVER double-wrap. Audit every controller.
- Add a test:
Object.keys(response.data)should never contain only"data" - In
apiClient, add a safety valve: ifresponse.data.dataexists andresponse.datahas no other meaningful keys, auto-unwrap - Document the envelope contract in your agent's instructions
- Add to agent instructions: "ALL property access on API data MUST use optional chaining (
?.) and nullish coalescing (?? fallback)" - Default arrays to
[]and objects to{}at the API client level - Add ESLint rules for unchecked nested property access
- In task prompts, explicitly state: "Add null safety everywhere"
- Sessions with prompts >~4KB in the
-pflag crashed or got suspended (SIGTSTP) - Sessions hit the 30-turn limit before finishing
- No reliable way to monitor progress without
screen+hardcopy - Keep
-pprompts short. Put details in a file:-p "Read /tmp/task.md and execute" - Use
screenfor long sessions - Monitor via
git diff --stat HEAD— most reliable progress indicator - Break large tasks into focused sessions: 1 session per page group, not 1 session for all 13 pages
- Max 20-30 turns per session for edit-heavy work; save 5 turns for verification
- Always specify exact file paths — don't let the agent search or guess
- Tests must cover detail pages, not just list pages
- After ANY fix session, open a browser and click through every page
- Add Playwright tests that navigate to
/templates/:id,/contracts/:id, etc. - "Passing tests" ≠ "working app" — tests only cover what they test
- Task #1 should always be: root layout, auth layout, dashboard layout, login page, working auth flow
- Verify the happy path (login → dashboard → one list page → one detail page) before building any features
- Scaffolding isn't glamorous, but without it, nothing else works
- Standardize: API should ALWAYS return camelCase (transform in serialization layer)
- Add a shared transform utility in
apiClient - If using Prisma, configure
@mapfor columns but return camelCase from controllers
Three Claude Code sessions spent ~30 minutes fixing files in the wrong directory. Every fix was invisible to users.
Why: Next.js resolves theapp/ directory from the project root. When @/ in tsconfig maps to ./, both directories compile fine — but only app/ is served. The build loop created both during scaffolding and nobody caught the duplication.
Prevention:
Lesson 2: Speculative Code = Integration Debt
This was the big one. What happened: The build loop wrote 85 tasks worth of frontend pages based on assumed API response shapes. When real data hit the pages, ~20+ fields didn't match.| Frontend Assumed | API Actually Returns |
|---|
section.content, section.type | section.title, section.clauses[] |
|---|---|
settings.mergeFields | Doesn't exist |
summary.discountPct | discount_pct (snake_case) |
|---|---|
financialsApi.getFinancials() → flat object | {data: {data: {...}}} (double-wrapped) |
workflowsApi.list().data | Already unwrapped — .data is undefined |
|---|---|
| Radix UI requires non-empty string |
Lesson 3: Response Envelope Inconsistency
What happened: The API wraps responses in{data, meta, error} via a global interceptor. The apiClient unwraps .data. But some endpoints double-wrap: {data: {data: [...]}}. Frontend code had inconsistent handling — some pages unwrapped twice, some once, some not at all.
This affected financials, reports, workflows, and templates. Four different features, same root cause.
Prevention:
Lesson 4: Null Safety Is Not Optional
What happened: Pages crashed on.length, .replace(), .toFixed(), .map() called on undefined or null. Every page that accessed nested objects (contract.counterparty.name, template.content.sections) was a crash waiting to happen.
This isn't an AI-specific problem. But AI-generated code is systematically worse at null safety because it writes the happy path — the path where data exists and is shaped correctly. It doesn't defensively code for the path where the API returns null for a field the UI assumes is always present.
Prevention:
Lesson 5: Claude Code Session Management
What happened:Lesson 6: Don't Declare Victory Without Browser Verification
What happened: 30/30 API tests. 38/38 frontend tests. 3/3 Playwright E2E. All passing. Every detail page broken. The tests only checked list pages and HTTP status codes. They never rendered detail views with real data. They never tested the full auth flow. They never verified that API query parameters matched what the frontend sent. Prevention:Lesson 7: Scaffolding First, Always
What happened: The build loop created 80+ feature pages before creating layouts, routing, auth flow, or a working login → dashboard path. We had a complete feature set and couldn't even log in. Prevention:Lesson 8: Snake_case vs CamelCase at the Boundary
What happened: Backend (Prisma/PostgreSQL) uses snake_case. Some API endpoints return snake_case (discount_pct), others camelCase (discountPct). Frontend types assume camelCase.
Prevention:
The Build Loop Checklist
For your next autonomous build, run this checklist BEFORE declaring done:□ Verify which directory the framework serves (app/ vs src/app/)
□ curl every API endpoint — save response shapes to a reference file
□ Compare frontend type definitions against actual API responses
□ Open every list page in browser — verify data loads
□ Click into every detail page — verify no crashes
□ Check all form submissions (create, edit, delete)
□ Verify response envelope consistency (no double-wrapping)
□ Run tsc --noEmit (TypeScript errors)
□ Run existing test suites
□ Check browser console for runtime errors on every page
□ Verify auth flow: login → navigate → refresh → still authenticated
□ Commit and document what was fixed
Time budget: 60% build, 40% integration + verification.
That ratio feels wrong. It feels like you should spend 90% building and 10% testing. But after this experiment, I can tell you: the integration bugs outnumber the build tasks, and they take longer to fix because each one requires understanding the seam between two systems.
The Bigger Picture
Here's what this experiment crystallized for me. AI can generate code at an astonishing pace. 85 tasks in 16 hours. A full enterprise application — auth, RBAC, multi-tenancy, CRUD for 9 entity types, search, reporting, admin — generated autonomously. Day 1 is genuinely magical. Day 2 is 27 bugs and the sobering realization that generation is the easy part. The hard part is:- Integration (do the pieces work together?)
- Optimization (does it perform under real conditions?)
- Reliability (does it keep working tomorrow?)
- Continuous improvement (does it get better over time?)
This is the gap between prototype and production. It's where demos go to die. And it's exactly the gap that Vizops exists to close.
We don't just build agents — we optimize them. We use reinforcement learning to continuously improve agent performance against multiple competing objectives (cost, accuracy, speed, safety). The same way a human engineer would iterate on those 27 bugs, but automated, continuous, and scalable.
The autonomous build loop is the future. I'm convinced of that. But the loop doesn't end at code generation. It ends when the system actually works — and keeps working.
If you're running autonomous AI builds and want to talk about what you're learning, reach out at contact@vizops.ai. We're collecting these patterns because the playbook is still being written.