This Is the First AI That Helped Build Itself

Image source: https://openai.com/index/intr…

TL;DR

GPT-5.3-Codex, an advanced model, enhances coding performance and reasoning, operating 25% faster than its predecessor. It excels in industry benchmarks, supports the software lifecycle, and can autonomously build complex applications. The model is available on multiple platforms with plans for API access.

Key Points

Highlight key points with color coding based on sentiment (positive, neutral, negative).

GPT-5.3-Codex significantly enhances coding performance, reasoning, and professional knowledge capabilities compared to its predecessor, GPT-5.2-Codex, and operates 25% faster.

The model excels in industry benchmarks such as SWE-Bench Pro and Terminal-Bench. It demonstrates superior coding and agentic abilities across multiple programming languages and complex tasks.

GPT-5.3-Codex can autonomously build complex applications, including games and apps, by iterating over millions of tokens and handling long-running and intricate projects.

Beyond coding, GPT-5.3-Codex supports the entire software lifecycle, assisting with tasks like debugging, deploying, monitoring, writing PRDs, and conducting user research.

The model improves handling of simple or underspecified prompts, providing more functional and sensible defaults.

OpenAI has released GPT-5.3-Codex, an updated version of GPT-5.2-Codex (itself a specialized variant of GPT-5.2 optimized for agentic coding). The new model runs 25% faster and combines stronger frontier-level coding performance with improved reasoning and professional knowledge. It is designed for long-running tasks that mix research, tool use, and complex execution, and it delivers strong results on industry benchmarks such as SWE-Bench Pro and Terminal-Bench.

Beyond raw coding ability, GPT-5.3-Codex represents a shift in scope. It is positioned not just as a code-writing assistant, but as a general-purpose, computer-using agent capable of operating across terminals, browsers, IDEs, and desktop environments. This broader capability is reflected in its performance on agentic benchmarks like OSWorld, which measure real-world computer use rather than isolated code generation. The model can autonomously build and iterate on complex applications, including full web games, and it shows improved intent understanding, producing more complete and production-ready results from underspecified prompts.

GPT-5.3-Codex is built to support the full software lifecycle. In addition to writing and reviewing code, it assists with debugging, deploying, monitoring, writing product requirement documents (PRDs), creating tests, analyzing metrics, and performing user research. Its professional knowledge capabilities are validated by strong results on GDPval, an evaluation that measures performance across well-specified knowledge-work tasks spanning dozens of occupations.

A key change in this release is how developers interact with the model. GPT-5.3-Codex supports interactive steering, allowing users to guide the agent while it is working. Instead of waiting for a final output, users can receive progress updates, ask questions, adjust direction, and provide feedback without losing context. This positions Codex more like a collaborative teammate than a fire-and-forget automation tool.

Notably, GPT-5.3-Codex was also used extensively in its own development. Early versions helped debug training runs, analyze evaluation anomalies, build data pipelines, and optimize deployment infrastructure. During launch, the model assisted with scaling GPU clusters and stabilizing latency under traffic surges. This process marks a rare example of an AI system materially accelerates its own research and deployment processes.

Cybersecurity is another major focus of this release. GPT-5.3-Codex is the first OpenAI model classified as “High capability” for cybersecurity-related tasks under the Preparedness Framework. While OpenAI reports no evidence that it can fully automate end-to-end cyber attacks, the model has been directly trained to identify software vulnerabilities. As a result, it is deployed with strengthened safeguards, including safety training, automated monitoring, trusted access controls, and enforcement pipelines. OpenAI has also committed $10 million in API credits to accelerate defensive cybersecurity research and ecosystem resilience.

GPT-5.3-Codex is available today through paid ChatGPT plans, everywhere Codex is supported: the app, CLI, IDE extension, and web. API access is planned but not yet generally available. The model is trained and served on NVIDIA systems.

Key Numbers

Present key numerics and statistics in a minimalist format.

25 % faster

The speed improvement of GPT-5.3-Codex compared to its predecessor.

10 Million USD

The financial commitment in API credits to accelerate cyber defense.

4 Languages

The number of programming languages evaluated by SWE-Bench Pro.

44 Occupations

The number of occupations covered by the GDPval benchmark.

2025

The year GDPval was released by OpenAI.

1 Million USD

The amount of API credits committed in the original Cybersecurity Grant Program.

Stakeholder Relationships

An interactive diagram mapping entities directly or indirectly involved in this news. Drag nodes to rearrange them and see relationship details.

Organizations

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.

NVIDIA Technology Company

NVIDIA is a leading technology company known for its graphics processing units and AI innovations.

Tools

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.

GPT-5.3-Codex AI Model

GPT-5.3-Codex is an advanced AI model designed to improve coding performance and reasoning capabilities.

GPT-5.2-Codex AI Model

GPT-5.2-Codex is the predecessor to GPT-5.3-Codex, focusing on coding and reasoning tasks.

SWE-Bench Pro Benchmark Tool

SWE-Bench Pro is a benchmarking tool used to evaluate software engineering performance.

Terminal-Bench Benchmark Tool

Terminal-Bench is a tool for benchmarking terminal-based applications.

OSWorld Benchmark Tool

OSWorld is a benchmarking tool used to assess operating system performance.

GDPval Benchmark Tool

GDPval is a tool used for evaluating general-purpose development performance.

Codex app Application

The Codex app is an application that utilizes the capabilities of the Codex models for various tasks.

Timeline of Events

Timeline of key events and milestones.

2023 Launch of the $1M Cybersecurity Grant Program

A program was launched to provide grants totaling $1 million to support cybersecurity initiatives.

2025 Release of GDPval

GDPval was released as an evaluation tool to measure a model’s performance on well-specified knowledge-work tasks across 44 occupations.

Recent months (prior to 2026) Model performance gains on cybersecurity tasks

There were meaningful gains in model performance on cybersecurity tasks, along with preparations for strengthened cyber safeguards.

February 5, 2026 Availability of GPT-5.3-Codex

GPT-5.3-Codex is available with paid ChatGPT plans, running 25% faster due to infrastructure improvements, and is classified as high capability for cybersecurity-related tasks.

Future (planned) Safe enablement of API access for GPT-5.3-Codex

Plans are in place to safely enable API access for GPT-5.3-Codex.