Compare GPT-5.4, GPT-5.3-Codex, and GPT-5.3-Codex-Spark: capabilities, context windows, pricing, and which model to use for which task.

Codex Models

Codex supports multiple models optimized for different coding tasks. This guide helps you choose the right model for your workflow.

Model Overview#

| Model | Context | Best For | Availability | |-------|---------|----------|-------------| | GPT-5.4 | 1M (experimental) | Mixed workflows, recommended default | All plans | | GPT-5.3-Codex | 400K+ | Pure coding, intensive programming | All plans | | GPT-5.3-Codex-Spark | 128K | Quick edits, rapid prototyping | Pro plan (research preview) | | GPT-5.2-Codex | 200K | CI/CD code review (still recommended for review pipelines) | API |

GPT-5.4: The Recommended Default#

GPT-5.4 is OpenAI's unified flagship model. It combines coding, reasoning, tool use, and computer control in one model.

Strengths#

47% fewer reasoning tokens than GPT-5.3-Codex for tool-heavy workflows
Native computer use — Can operate browsers and desktop applications
Strongest general reasoning — Best for tasks that mix code with planning
1M context window (experimental) — Handle massive codebases

When to Use It#

Your default model for most Codex tasks
Mixed workflows involving code + reasoning + tools
Tasks that require understanding context beyond just code
Agent workflows with frequent tool calls

# ~/.codex/config.toml
model = "gpt-5.4"

GPT-5.3-Codex: The Coding Specialist#

GPT-5.3-Codex is optimized specifically for software engineering. It was trained on complex, real-world engineering tasks.

Strengths#

State-of-the-art SWE-Bench Pro — Spans four programming languages
Terminal-Bench 2.0 leader at 77.3% — Best for terminal/DevOps work
25% faster than its predecessor (GPT-5.2-Codex)
More cost-effective for pure coding than GPT-5.4
400K+ context window — Handles large codebases

When to Use It#

Intensive programming tasks (feature development, refactoring)
Terminal-native workflows (DevOps, scripting, CLI tools)
Budget-sensitive projects (lower per-token cost)
Code review (set as review_model)

model = "gpt-5.3-codex"
# Or set just for review
review_model = "gpt-5.3-codex"

Benchmarks#

| Benchmark | GPT-5.3-Codex | GPT-5.4 | |-----------|---------------|---------| | SWE-Bench Pro | 56.8% | Comparable | | Terminal-Bench 2.0 | 77.3% | Lower | | Token efficiency | Baseline | 47% fewer on tool-heavy tasks |

GPT-5.3-Codex-Spark: The Speed Demon#

Spark is a distilled, Cerebras-accelerated variant designed for near-instant responses at over 1,000 tokens per second.

Strengths#

15x faster generation than standard GPT-5.3-Codex
Near-instant responses — Keeps you in flow
Great for small tasks — Quick edits, fixes, explanations
Powered by Cerebras — Ultra-low latency hardware

Limitations#

Warning

Spark is not a replacement for GPT-5.3-Codex. It trades reasoning depth for speed:

128K context (vs 400K+ for full model)
Weaker on multi-step reasoning and complex debugging
Known hallucination issues — Fabricated API endpoints, phantom packages
Terminal-Bench estimate ~58.4% (vs 77.3% for full model)
Unreliable structured output in some cases

### When to Use It

Quick explanations and code snippets
Simple bug fixes with clear errors
Rapid prototyping iterations
Tasks where speed matters more than depth

model = "gpt-5.3-codex-spark"

Model Selection Guide#

| Task Type | Recommended Model | Why | |-----------|-------------------|-----| | Daily development | GPT-5.4 | Best all-around | | Complex feature | GPT-5.3-Codex | Strongest coding | | Quick fix | GPT-5.3-Codex-Spark | Fastest response | | Code review | GPT-5.3-Codex | Best accuracy | | DevOps/terminal work | GPT-5.3-Codex | Terminal-Bench leader | | Mixed code + reasoning | GPT-5.4 | Unified capabilities | | Budget-conscious work | GPT-5.3-Codex | Lower per-token cost | | CI/CD pipelines | GPT-5.2-Codex or GPT-5.3-Codex | Reliable, cost-effective |

API Pricing#

| Model | Input (per 1M tokens) | Output (per 1M tokens) | Caching Discount | |-------|----------------------|----------------------|-----------------| | GPT-5.4 | $1.25 | $10.00 | — | | GPT-5.3-Codex | $1.50 | $6.00 | 75% prompt caching |

Tip

GPT-5.3-Codex's 75% prompt caching discount makes it significantly cheaper for repetitive tasks like CI/CD reviews where the same codebase context is sent repeatedly.

## Switching Models

In config.toml#

model = "gpt-5.4"                    # Default model
review_model = "gpt-5.3-codex"      # Review-specific model

At the Command Line#

codex -c model="gpt-5.3-codex" "Refactor the auth module"

In the TUI#

Use the /model slash command to switch models during a session.

Per-Profile#

[profiles.deep-work]
model = "gpt-5.3-codex"
model_reasoning_effort = "high"

[profiles.quick-tasks]
model = "gpt-5.3-codex-spark"

The Evolution of Codex Models#

| Release | Model | Key Improvement | |---------|-------|-----------------| | Dec 2025 | GPT-5.2-Codex | Strong agentic coding | | Feb 2026 | GPT-5.3-Codex | 25% faster, SOTA SWE-bench Pro | | Feb 2026 | GPT-5.3-Codex-Spark | 15x speed via Cerebras partnership | | Mar 2026 | GPT-5.4 | Unified coding + reasoning + computer use |

Next Steps#

Codex vs Claude Code — Cross-platform comparison
Codex vs Cursor — IDE comparison
Configuration — Set your default model