Introducing GPT-5.5
Introducing GPT-5.5
OpenAI model release focused on agentic coding, computer use, and long-horizon knowledge work.
Links
Key Claim (excerpt)
GPT-5.5 excels at writing and debugging code, researching online, analyzing data, creating documents and spreadsheets, operating software, and moving across tools until a task is finished. The gains are especially clear in agentic coding, computer use, knowledge work, and early scientific research—areas where progress depends on reasoning across context and taking action over time.
Overview
GPT‑5.5 is positioned as OpenAI’s next step toward “a new way of getting work done on a computer” — more autonomous planning and tool-use on messy, multi‑part tasks, while keeping serving latency similar to GPT‑5.4.
Highlights (from the post)
- Stronger at planning + tool use + self-checking over longer tasks
- Improved efficiency: often fewer tokens and fewer retries on the same Codex tasks
- Safety: described as OpenAI’s strongest safeguards to date, with expanded testing (cybersecurity/biology) and feedback from ~200 early-access partners
- Rollout: Plus/Pro/Business/Enterprise users in ChatGPT and Codex; API “very soon” (per post)
Benchmarks (selected, from the post)
- Terminal‑Bench 2.0: 82.7% (vs 75.1% for GPT‑5.4)
- OSWorld‑Verified: 78.7% (vs 75.0% for GPT‑5.4)
- BrowseComp: 84.4% (GPT‑5.5), 90.1% (GPT‑5.5 Pro)
- FrontierMath Tier 1–3: 51.7% (vs 47.6% for GPT‑5.4)
- CyberGym: 81.8% (vs 79.0% for GPT‑5.4)
(See post for full table including Claude Opus 4.7 and Gemini 3.1 Pro comparisons.)