When users say “the model got worse,” the uncomfortable possibility is that your harness did. Anthropic published a detailed postmortem on April 23 explaining why Claude Code felt degraded for weeks—and what changed to fix it.
Key takeaways
- Anthropic attributes most complaints to three overlapping changes in Claude Code’s harness (not a single model regression).
- All issues are reported as resolved as of Apr 20 in Claude Code v2.1.116.
- If you’re running internal “Codex-like” workflows, this is a cautionary tale: defaults, caching, and context management can silently erode outcomes.
What actually went wrong (high-level)
- Defaults: small changes to reasoning or system instructions can trade latency for quality without obvious release signals.
- Context/thinking lifecycle: clearing or truncating “older thinking” to reduce latency can change how the agent behaves after idle time.
- Cross-component bugs: issues can sit in the intersection of context management, extended thinking, and API behavior.
Action checklist for teams
- Record your exact toolchain version (client, SDK, prompts) whenever you ship a workflow change.
- Keep an internal eval suite that detects 2–5% quality drops before rollout.
- Separate “model changes” from “harness changes” in your incident process and postmortems.
