For the latest episode of the OpenAI Podcast I sat down with OpenAI president and co-founder Greg Brockman and Code engineering lead Thibault Sottiaux to talk about the release of OpenAI’s new GPT-5-Codex model. It was great to listen to Greg and Thibault explain OpenAI’s approach towards code models, agents and where things were headed. These are some of my takeaways from the conversation:
GPT-5-Codex and can think for up to 7 hours about a problem
One of the new ways to evaluate models is by how long they can think about a problem and arrive at a correct solution. With the introduction of OpenAI’s “reasoning” model o1 (just one year ago) was the new paradigm that models could provide smarter answers not just through scaling or improving data quality, but by letting the model spend longer on a problem; breaking it down and performing what we would call “System 2 thinking” as described in Daniel Kahneman’s book “Thinking, Fast and Slow” which outlines two distinct modes of thought: System 1 (fast, intuitive) and System 2 (slow, deliberate). The breakthrough with o1 was enabling AI to engage in this more methodical, step-by-step approach to problem-solving rather than generating answers in a single pass. GPT-5-Codex takes this to a new level by being able to dedicate up to 7 hours of compute time to solve particularly complex programming challenges.
What are the implications of this?
We’re only just now beginning to understand what kind of complex problems are solvable by today’s AI systems. We’re going to see this benchmark pushed even further as OpenAI and other labs explore the full potential of thinking longer. We’ll also start asking not only how big a model is, but how much time it can spend on a problem. Solving problems faster with less compute is the ultimate goal, but the ability for a model to think for a long time introduces entirely new kinds of problems we can throw at AI.
Codex was an internal tool at OpenAI
OpenAI has had to scale quickly at an unheard of pace. When I first started at OpenAI in 2020 there were around 150 people – this included both research and the newly formed API team that was supporting commercial deployment. While OpenAI has grown to thousands of employees today, they’re still incredibly small compared to Google, Meta and Microsoft – yet they’re supporting over 700 million weekly active users. To do handle this growth OpenAI started by using whatever off the shelf solutions were available but then gradually began developing in house solutions. Besides putting GPT-4 to work handling Slack messages and customer support, they began creating their own internal tools. The Codex CLI (a tool that runs in the command line and allows you to call different models) evolved from an early experimental tool called “10x.” By putting this to use internally and getting feedback from 100s of OpenAI engineers they were able to iterate and improve it to the point it became a vital part of their infrastructure.
Codex has evolved from just a model that generated code to an entire suite of tools. You can download the CLI and use it in your terminal. You can install the Codex extension and connect it to your OpenAI account and you can run it from ChatGPT and connect it to GitHub.
At OpenAI Codex isn’t just used to write code, they also use it to perform code reviews. At first people were concerned this would cause more noise than signal, but soon engineers began to appreciate that now only did it help the squash bugs, the tool’s ability to explain code helped them understand the codebase better. (As a newly hired engineer at OpenAI I remember being a bit intimidated by the sprawling code base that covered everything from advanced reinforcement learning tools to blog posts!)
What happens next?
I’ve heard that people coming over to OpenAI from other organizations were surprised by how much OpenAI was using AI to accelerate their growth. This has been a common theme from all of the executives at OpenAI that I’ve spoken to. And it’s not just the engineers that are using these tools. People in various teams are now learning the basics of coding and putting Codex to work for themselves to solve specific use cases. I expect other companies will have to follow suit if they want to keep up – and that will extend not just to AI companies, but any organization.
Coding in the cloud is a new frontier
One of the features of Codex is the ability to have tasks run remotely while you focus on other things. For example you could send a huge task like refactoring a codebase from Python to Rust to the OpenAI servers and launch other tasks from your terminal. Most of us are conditioned to think about working on one task at a time, or at most having a few terminals as we go back and forth. Greg Brockman pointed out that this was a new feature people were only just beginning to understand.
Where will this go?
In all likelihood most coding will eventually be done in the cloud and only a small percentage will happen in real-time as people work with IDEs. To get there we have to think of ourselves as project managers and increase the scope by which we think about things. This will take some adjusting. Part of the reason a lot of us like to code is the hands on nature. This is part of the reason “vibe coding” has taken off as other people discover the joy of making something that works. Cloud code generation doesn’t mean the end of vibe coding – it just means that we’ll be able to vibe code much more elaborate projects as we deploy various agents to handle all the smaller tasks.
GPT-5-Codex is designed to work well with coding tools
One of the things that has become obvious since the launch of the first code models is that simply copy-pasting model responses into a code base isn’t the entire solution. Tools like Copilot, Cursor, Claude Code and Codex have shown that how you use the model is almost as important as the model’s core capabilities. A “harness” around the model can let it outperform its base evals. Using effective AGENT.md files (instructions for the model), code summarization and intelligent context tools can help them perform much more sophisticated tasks.
OpenAI developed GPT-5-Codex to work in a variety of harnesses from Codex CLI to third-party tools. While GPT-5 already outperformed other models in long horizon tasks, training a speciality model to be more flexible about the environment should yield some very interesting results.
What’s next?
Although I worked at OpenAI and host their podcast, I use a variety of models personally and my AI deployment firm Interdimensional is model agnostic. I was using Claude Sonnet extensively in Cursor for many of my coding tasks (along with o4-mini-high.) With the introduction of GPT-5 I switched over to GPT-5-high exclusively. I found it very capable in Codex CLI and handled tasks that Sonnet and Opus couldn’t do. I’m very curious to see how GPT-5-Codex works out after more internal testing because GPT-5 was already amazing. However, Anthropic, Google and xAI aren’t sitting still. As Sam Altman likes to say, “For every you make, they get a counter move.”