← All Discover

Top 10 AI Coding Tools

AI coding tools are evolving fast. From Cursor's economics to Replit's agent, we test which ones actually ship. This category tracks the tools that are reshaping how we write code.

By Holt · synthesized from 3 sources

Cursor is bleeding $32 per Pro user every month, and the math is not a secret — it’s sitting in plain sight on Anthropic’s pricing page and in Cursor’s own apology post from July 2025. That’s the number that defines this moment in AI coding tools: the unit economics of selling “unlimited Sonnet” for $20 simply don’t work when a single Sonnet turn costs eleven cents and power users run 200–600 turns a day. Meanwhile, I gave seven AI coding agents the same refactoring task and only two shipped working code on the first try — Claude Code and Cursor (forced Sonnet) passed, while the rest failed or needed human fixes. The trend isn’t that these tools are getting better; it’s that the gap between what they promise and what they deliver is being measured in real dollars and real afternoons.

The through-line across these three investigations is that inference cost is the hidden governor of every AI coding product, and nobody has solved it. Cursor’s Cursor is losing $32 per Pro user. Here is the math. shows a SaaS business masquerading as a reseller — its gross margin is determined by whatever Anthropic publishes next quarter, not by its own engineering. Replit’s I gave Replit Agent 3 seven startup ideas: what shipped, what cost me $480 tells the same story from the user side: a $95 Pro plan with $100 in monthly credits evaporated on one idea (a dog-walker marketplace that shipped broken payouts) and the final bill hit $480. Even the agents that finished in I tested 7 AI coding agents in one afternoon. Only 2 finished. — Claude Code at $2.31 and Cursor at roughly $11 for the full test — were only viable because I brought my own model endpoint. The moment you rely on the tool’s built-in model routing, you’re playing a game where the house always wins.

Where these articles agree is that the “agent” label is doing heroic work. Cursor’s Auto router picks the cheapest model that can still finish the task, which is how they market “unlimited” while actually shipping GPT-4o-mini-tier responses for most requests. Replit’s Agent 3 shipped four of seven ideas, but the two that “shipped” — the Whisper + Claude meeting notes app and the Twitch overlay generator — came with broken auth flows and dangling dependencies that a human had to clean up. The agent test’s only clean pass was Claude Code, which is essentially Anthropic’s own CLI wrapper, not a third-party product. The disagreement is subtle: Cursor and Replit are betting that users will tolerate noise and overage charges for speed, while Claude Code and Codex CLI (which got a “Partial” for surfacing an untested code path) are betting that users want correctness and honesty first. The market hasn’t picked a winner yet, but the numbers say speed without margin is a Ponzi scheme.

The contrarian angle is that these articles are too kind to the tools that “finished.” Cursor’s agent mode passed my refactoring test, but it renamed a prop without asking and pulled in twelve files where four were needed — that’s not a win, that’s a chatty intern who wastes review time. Replit’s four “shipped” apps included one that required manual deletion of dangling Clerk components and another where the auth flow was so tangled it took longer to untangle than building from scratch. The real missing insight is that AI coding tools are excellent at generating the illusion of progress — they produce a diff that compiles, a UI that renders, a bill that surprises. But the cost of verifying and fixing their output is rarely counted in the marketing. Cursor’s apology post admitted the $20 plan was always a tripwire, but nobody is talking about the tripwire for the user: the hours spent diagnosing a bad diff, the $380 overage on a single weekend, the four agents that shipped code that broke tests in ways that took longer to fix than rewriting.

If you only read one, make it Cursor is losing $32 per Pro user. Here is the math. because it exposes the foundational lie of the entire category: these aren’t SaaS products, they’re inference resellers pretending to have margins. The agent test and the Replit experiment are symptoms of the same disease — when your COGS is dictated by a third-party rate card, every feature is a loss leader, and every “pro” user is a liability. Until a tool builds its own model or negotiates a deal that makes the unit economics work, the smart money is on bringing your own API key and treating every agent like a contractor you pay per task, not a subscription you pray will pay for itself.

Don’t fall behind
AI changes daily. Stay ahead of the curve.

One brief, Wednesday. Five stories, one chart. 14,200 builders read it.

Free. One click to unsubscribe.
Go in-depth
  1. 1
    Editorial illustration of Cursor's underwater unit economics
    Deep Dive
    Cursor is losing $32 per Pro user. Here is the math.
    The July 2025 apology, the Anthropic-only routing, the $20 plan that buys 225 Sonnet calls. The public numbers tell one story: a price hike is mechanically required before year-end.
    3 min · Holt
  2. 2
    Seven AI coding agents arranged in a semicircle around an empty office chair
    Review
    I tested 7 AI coding agents in one afternoon. Only 2 finished.
    Same React refactor. Same Sonnet 4.6 endpoint. Claude Code and Cursor one-shot it. Devin spent 87 minutes and broke unrelated tests. Here are the real results, by the run, not by the benchmark.
    3 min · Holt
  3. 3
    I gave Replit Agent 3 seven startup ideas: what shipped, what cost me $480
    Review
    I gave Replit Agent 3 seven startup ideas: what shipped, what cost me $480
    Seven ideas, eight days, one credit card. Four apps shipped and two embarrassed me. The honest tally on Replit Agent 3 — feature by feature, dollar by dollar.
    3 min · Holt
More on Discover