Coding agents: cost vs speed benchmarks (Stephen Reid)

Interactive table and chart comparing coding agents by tasks per dollar vs time per task.

Stephen Reid coding agents page

Announcement (Substack excerpt)

If we accept the idea that there’s a threshold level of capability beyond which agentic coding models are all ‘good enough’; and that we’ve crossed it (~Opus 4.5 level), then what becomes interesting is cost v speed (here shown as tasks per dollar v time per task, so upper left is most desirable).

On this basis, Cursor’s Composer 2 is an extreme outlier. It’s ~as fast as any other agent (or in my experience, even faster with Fast mode turned on in Cursor), yet for a dollar you can complete 14 tasks (compared to around 0.5–1 tasks per dollar for most frontier models i.e. Composer is ~15–30x cheaper).

Chart from stephenreid.net/agents, data from the new Artificial Analysis Coding Agent Benchmarks

Substack announcement screenshot (with chart)

What the page shows (light)

A table with agent/model configurations and benchmark components (e.g. SWE-Bench Pro, Terminal-Bench v2) plus derived metrics:

  • Tasks per $
  • Time per task (seconds)

It surfaces trade-offs between:

  • Absolute capability
  • Speed
  • Cost efficiency

Key point

Once models are “good enough” at agentic coding, economics (cost per task) and latency become the differentiators — and Cursor Composer 2 appears as a major cost outlier in these numbers.