Cursorresearch

Scaling long-running autonomous coding

January 14, 2026 at 12:00 PM

Summary

TL;DR

Running many coding agents in parallel hit coordination limits fast. A role-split system (planners create tasks; workers execute) scaled to hundreds of concurrent agents and week(s)-long runs.

What actually happened

Started with flat, self-coordinating agents using a shared coordination file and locking
Locking throttled throughput and was brittle when agents failed or ignored locks
Switched to optimistic concurrency control; coordination got simpler but progress still stalled
Introduced role separation: planners generate tasks; workers implement; a judge loops iterations
Ran long experiments: a browser from scratch, a Solid→React migration, and product performance work

Key numbers

Over 1 million lines of code

1,000 files

Close to a week to build a web browser from scratch

Over three weeks for the Solid→React migration

+266K/-193K edits in that migration

Video rendering 25x faster via a Rust implementation

Why this was hard

Locks became a bottleneck; many agents spent time waiting instead of coding
The coordination layer was fragile when agents crashed or mishandled lock/state updates
Without hierarchy, agents avoided risky end-to-end tasks and churned on “safe” changes
Long-running autonomy introduced drift and tunnel vision, requiring periodic resets

How they solved it

Replaced flat coordination with a pipeline: planners define work; workers execute it
Made planning parallel and recursive by allowing planners to spawn sub-planners
Removed worker-to-worker coordination; workers focus only on their assigned task
Added a judge step each cycle to decide whether to continue before starting a fresh iteration
Dropped an “integrator” role after it created bottlenecks; workers handled conflicts themselves
Chose models per role; used GPT-5.2 for extended autonomy and planning over alternatives
Iterated heavily on prompting to prevent pathological coordination and focus failures

What changed

Hundreds of workers could push to the same branch with minimal conflicts
Large codebases remained understandable enough for new agents to contribute
A long-running agent delivered a merged Rust rewrite that made rendering 25x faster
The Solid→React migration reached CI/early-check passing status but still needed careful review

Why this matters beyond this company

Parallelism alone doesn’t scale; coordination mechanisms can dominate throughput
Purely “flat” agent swarms can become conservative and fail to own hard end-to-end work
Simpler orchestration (role separation, fewer chokepoints) can outperform heavier process layers
Prompt design can drive system behavior as much as the harness or model choice

Stealable ideas

Split agents into planners (task discovery) and workers (task execution)
Avoid lock-heavy shared-state coordination; it can become the main bottleneck
Prefer per-role model selection instead of a single universal model
Use periodic fresh-start cycles to limit drift in long-running autonomous work

Read original article

Scaling long-running autonomous coding

Summary

What actually happened

Key numbers

Why this was hard

How they solved it

What changed

Why this matters beyond this company

Stealable ideas

More from Cursor

Scaling long-running autonomous coding

Summary

What actually happened

Key numbers

Why this was hard

How they solved it

What changed

Why this matters beyond this company

Stealable ideas

More from Cursor