OpenAI has launched the GPT-5.4 Thinking and Pro AI models in ChatGPT, completely skipping GPT-5.3 Thinking and Pro. The new ChatGPT-5.4 is OpenAI's most capable, efficient, and consequential release to date, rolling out now across ChatGPT, the API, and Codex.
For months, OpenAI's capabilities were spread across a constellation of specialized models.
- GPT-5.3-Codex handled software engineering.
- GPT-5.2 carried the reasoning torch for knowledge work.
- GPT-5.4 consolidates everything into a single, unified frontier model, combining the coding capabilities of GPT-5.3-Codex with the improved reasoning of GPT-5.2.
ChatGPT-5.4 has brought together the industry-leading coding capabilities previously exclusive to the Codex series and combined them with improved general reasoning and agentic workflow performance. Finally, GPT-5.4 is OpenAI's first general-purpose model with native computer-use capabilities, allowing agents to operate software and websites more directly.
AINews.sh: Stay ahead with the latest AI product releases, in-depth reviews, and news. Compare AI tools, open-source models, and paid platforms.
A few headline improvements deserve attention:
- Stronger knowledge work performance: GPT-5.4 has outperformed GPT-5.2 on GDPval and on OpenAI's internal spreadsheet and presentation evaluations.
- More capable computer use: On OSWorld-Verified, GPT-5.4 has scored 75.0%, versus 47.3% for GPT-5.2.
- Competitive coding gains: On SWE-Bench Pro (Public), GPT-5.4 scored 57.7%, which is slightly above GPT-5.3-Codex at 56.8% and GPT-5.2 at 55.6%.
- Better tool-based workflows: On BrowseComp, GPT-5.4 reached 82.7%, compared with 65.8% for GPT-5.2.
- Lower token waste: OpenAI says GPT-5.4 is its most token-efficient reasoning model yet, using fewer tokens than GPT-5.2 on many tasks.

Testing the new ChatGPT-5.4 with Computer Use:
We already know new models are, on paper, more capable than their predecessors, but how does that translate into real-world performance? The following are a few tests I ran using the new ChatGPT-5.4 to test its performance in real-world tasks.
1. Research:
ChatGPT already has deep research; however, I wanted to test how GPT-5.4 would perform in a research task with strict instructions. OpenAI claims GPT‑5.4 Thinking also improves deep web research, particularly for highly specific queries, while better maintaining context for questions that require longer thinking.
Prompt: You are my research analyst. I need a deeply sourced answer to this question: "What are the AI agents? What are the benefits of an AI agent? What are the measures taken by AI companies to reduce AI hallucinations over time, and is there an AI bubble?"
Rules:
- Browse the web and use at least 10 sources.
- At least 5 sources must be primary or near primary (lab posts, eval repos, academic papers, official docs).
- Build a 2 column list:
Column 1: Major AI companies
Column 2: Measures taken by the AI companies to reduce AI hallucinations. - Include a short section called "The AI bubble" with at least 5 concrete examples.
- End with a simple checklist I can use to judge claims in 2 minutes.
Output format:
- Your upfront plan (max 12 lines)
- Findings with citations after every important claim
- Final checklist
Verdict:
The output was much better than I could have expected, and it only took ChatGPT-5.4 Thinking, 1 minute and 44 seconds to complete reasoning.
Additionally, I wanted to show the new feature that allows users to add or change the prompt's context on the fly. Initially, in the prompt, I only asked for 10 sources; later, I asked for 20 without having to pause and restart the entire process.
AdCreative.ai: An AI-powered platform that automates the creation of high-performing ad creatives for social media and display campaigns.
2. Turning research into a PowerPoint presentation:
Next, I wanted to check if the new model could convert the research into a PowerPoint presentation.
Prompt: Turn the research into a 10-slide PowerPoint presentation. Keep the content concise, but make sure nothing is missing. Make the PowerPoint look creative, engaging, and professional.
Verdict:
It took ChatGPT-5.4 Thinking 22 minutes and 45 seconds to complete the whole presentation, which was slower than I expected; however, it was faster than what a human would have done. I personally like the theme it went for, and you can download the presentation in the PPTX format and edit it.
3. Vibe coding:
Vibe coding is still very popular among both technical and non-technical users, and benchmarks show that GPT-5.4 outperforms GPT-5.3-codex and GPT-5.2 for vibe coding. Let's put the new model in action.
Prompt: Build a 3D chess game where you can click and move the pieces. Make the chess game beautiful with marble and glass, with realistic lighting effects. There would be multiple chess designs to choose from, as well as a background and timer.
Verdict:
ChatGPT-5.4 wrote 585 lines of code in less than 3 minutes, which is indeed very impressive. The chess itself was fairly decent, with good animation. Although I believe GPT-5.4 could have done better with colours and customization.
Editor's note:
I like the new ChatGPT-5.4, and the highlight feature for me wasn't its power, speed, or the vibe coding ability, but the ability to add or remove context while the model is thinking, without you needing to pause or restart; that's MVP of this new update for me. Overall, the GPT-5.4 is a very powerful new AI model and is much faster than the previous GPT-5.2. If you are a ChatGPT user, you'll enjoy the speed and power.
💡 For Partnership/Promotion on AI Tools Club, please check out our partnership page.