Codex Models
Compare GPT-5.4, GPT-5.3-Codex, and GPT-5.3-Codex-Spark: capabilities, context windows, pricing, and which model to use for which task.
Codex Models
Codex supports multiple models optimized for different coding tasks. This guide helps you choose the right model for your workflow.
Model Overview#
| Model | Context | Best For | Availability | |-------|---------|----------|-------------| | GPT-5.4 | 1M (experimental) | Mixed workflows, recommended default | All plans | | GPT-5.3-Codex | 400K+ | Pure coding, intensive programming | All plans | | GPT-5.3-Codex-Spark | 128K | Quick edits, rapid prototyping | Pro plan (research preview) | | GPT-5.2-Codex | 200K | CI/CD code review (still recommended for review pipelines) | API |
GPT-5.4: The Recommended Default#
GPT-5.4 is OpenAI's unified flagship model. It combines coding, reasoning, tool use, and computer control in one model.
Strengths#
- 47% fewer reasoning tokens than GPT-5.3-Codex for tool-heavy workflows
- Native computer use — Can operate browsers and desktop applications
- Strongest general reasoning — Best for tasks that mix code with planning
- 1M context window (experimental) — Handle massive codebases
When to Use It#
- Your default model for most Codex tasks
- Mixed workflows involving code + reasoning + tools
- Tasks that require understanding context beyond just code
- Agent workflows with frequent tool calls
# ~/.codex/config.toml model = "gpt-5.4"
GPT-5.3-Codex: The Coding Specialist#
GPT-5.3-Codex is optimized specifically for software engineering. It was trained on complex, real-world engineering tasks.
Strengths#
- State-of-the-art SWE-Bench Pro — Spans four programming languages
- Terminal-Bench 2.0 leader at 77.3% — Best for terminal/DevOps work
- 25% faster than its predecessor (GPT-5.2-Codex)
- More cost-effective for pure coding than GPT-5.4
- 400K+ context window — Handles large codebases
When to Use It#
- Intensive programming tasks (feature development, refactoring)
- Terminal-native workflows (DevOps, scripting, CLI tools)
- Budget-sensitive projects (lower per-token cost)
- Code review (set as
review_model)
model = "gpt-5.3-codex" # Or set just for review review_model = "gpt-5.3-codex"
Benchmarks#
| Benchmark | GPT-5.3-Codex | GPT-5.4 | |-----------|---------------|---------| | SWE-Bench Pro | 56.8% | Comparable | | Terminal-Bench 2.0 | 77.3% | Lower | | Token efficiency | Baseline | 47% fewer on tool-heavy tasks |
GPT-5.3-Codex-Spark: The Speed Demon#
Spark is a distilled, Cerebras-accelerated variant designed for near-instant responses at over 1,000 tokens per second.
Strengths#
- 15x faster generation than standard GPT-5.3-Codex
- Near-instant responses — Keeps you in flow
- Great for small tasks — Quick edits, fixes, explanations
- Powered by Cerebras — Ultra-low latency hardware
Limitations#
Spark is not a replacement for GPT-5.3-Codex. It trades reasoning depth for speed:
- 128K context (vs 400K+ for full model)
- Weaker on multi-step reasoning and complex debugging
- Known hallucination issues — Fabricated API endpoints, phantom packages
- Terminal-Bench estimate ~58.4% (vs 77.3% for full model)
- Unreliable structured output in some cases
- Quick explanations and code snippets
- Simple bug fixes with clear errors
- Rapid prototyping iterations
- Tasks where speed matters more than depth
model = "gpt-5.3-codex-spark"
Model Selection Guide#
| Task Type | Recommended Model | Why | |-----------|-------------------|-----| | Daily development | GPT-5.4 | Best all-around | | Complex feature | GPT-5.3-Codex | Strongest coding | | Quick fix | GPT-5.3-Codex-Spark | Fastest response | | Code review | GPT-5.3-Codex | Best accuracy | | DevOps/terminal work | GPT-5.3-Codex | Terminal-Bench leader | | Mixed code + reasoning | GPT-5.4 | Unified capabilities | | Budget-conscious work | GPT-5.3-Codex | Lower per-token cost | | CI/CD pipelines | GPT-5.2-Codex or GPT-5.3-Codex | Reliable, cost-effective |
API Pricing#
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Caching Discount | |-------|----------------------|----------------------|-----------------| | GPT-5.4 | $1.25 | $10.00 | — | | GPT-5.3-Codex | $1.50 | $6.00 | 75% prompt caching |
GPT-5.3-Codex's 75% prompt caching discount makes it significantly cheaper for repetitive tasks like CI/CD reviews where the same codebase context is sent repeatedly.
In config.toml#
model = "gpt-5.4" # Default model review_model = "gpt-5.3-codex" # Review-specific model
At the Command Line#
codex -c model="gpt-5.3-codex" "Refactor the auth module"
In the TUI#
Use the /model slash command to switch models during a session.
Per-Profile#
[profiles.deep-work] model = "gpt-5.3-codex" model_reasoning_effort = "high" [profiles.quick-tasks] model = "gpt-5.3-codex-spark"
The Evolution of Codex Models#
| Release | Model | Key Improvement | |---------|-------|-----------------| | Dec 2025 | GPT-5.2-Codex | Strong agentic coding | | Feb 2026 | GPT-5.3-Codex | 25% faster, SOTA SWE-bench Pro | | Feb 2026 | GPT-5.3-Codex-Spark | 15x speed via Cerebras partnership | | Mar 2026 | GPT-5.4 | Unified coding + reasoning + computer use |
Next Steps#
- Codex vs Claude Code — Cross-platform comparison
- Codex vs Cursor — IDE comparison
- Configuration — Set your default model