I went to hands with Chatgpt Codex and the vibe was not good - what has happened here

Key takeaways of zdnet

Chatgpt Codex wrote the code and saved me time.
It also created a serious bug, but it was able to recover.
Codex is still based on GPT-4 LLM architecture.

Well, vibe coding is not this. I found experience slow, cumbersome, stressful and incomplete. But all this worked at the end.

The Chatgpt Codex is the agent tool of the chatgpt that is dedicated to code writing and amendment. This can reach your github repository, change, and issue a bridge request. Then you can review the results and decide whether to include them.

Also: How to transfer your codebase to analyzed by chatgpt deep research in Github – and why you should do it

My primary development project site is a PHP and JavaScript-based WordPress plugin for safety. A main plugin is available for free, and some ad-on plugins that increase the capabilities of core plugin. My personal development repo includes all this, as well as some maintenance plugins I trust for user support.

There are 431 files in this repo. This is the first time I have tried to get AI to work in my entire ecosystem of plugins in a private repository. I previously used Jules to add a feature to the core plugin, but because it had only access to the open source repository of the core plugin, it could not take into account the entire ecosystem of the products.

Earlier last week, I decided to give a run to chat Codex in my code. Then this happened.

GPT-5 released

On Thursday, GPT -5 slammed like a freight train in AI World. Initially, Openai tried to force everyone to use the new model. Subsequently, he added heritage model support when many of his customers went to ballistic.

I ran the GPT-5 against the set of my programming test, and it failed half of them. Therefore, I was especially curious about whether Codex still supported GPT -4 architecture or would force developers in GPT -5.

However, when I quercated Codex five days after the launch of GPT -5, AI replied that it was still based on “Openl’s GPT -4 Architecture”.

No -GPT -5 — Screenshot by David Guirtz/ZDNET

I took two things from him:

Openai Codex is not ready to transfer coding to GPT-5 (which, remember, failed half of my tests).
Results, conclusions, and screenshots that I have taken for my Codex tests are still valid, as Codex is still based on GPT -4.

With this, here I still have a result of not very much-GPT-5, which is on the Chatgpt Codex.

launch

My first step was asking the Chatgpt Codex to check the codebase. I used Codex’s Ask mode, which analyzes, but in fact no code changes.

examine — Screenshot by David Guirtz/ZDNET

I was expecting something as deep and comprehensive, but I got very little full analysis a few months ago as deep and comprehensive, but instead.

Overview — Screenshot by David Guirtz/ZDNET

I felt that a more effective approach is asking Codex to audit a quick security and tell me whether there was a problem. Here is how I inspired it.

Identify any serious safety concerns. Ignore plugins to anyone with links, license fixers and settings Nuker. Any person with links is in very early stages of coding, and the code is not ready for review. License fixer and settings are Nuker special plugins that do not require security audit.

Codex identified three main areas for improvement.

security issues — Screenshot by David Guirtz/ZDNET

All three areas were valid, although I am not ready to modify the serialization data structure at the moment, as I am saving it for complete preferences. $ _Post complaint is managed, but with a different approach compared to Codex.

Also: Best AI for coding in 2025 (and what not to use)

The third area was something worth changing the risk-breaking (CSRF) forgery (CSRF) Risk. While the access to the user interface for plugin is determined by the login role, the plugins themselves do not clearly check that the person submitting plugin settings for action is allowed to do so.

This is what I decided to invite Codex to fix.

Code fix

Next, I instructed Codex to improve the code. I changed the settings from Ask mode to code mode so that AI really tries to change. With the Chatgpt agent, Codex increases a virtual terminal to do some of its work.

Terminal — Screenshot by David Guirtz/ZDNET

When the process is complete, Codex showed a difference (the difference between the original and-to-managed code).

Diff -1 — Screenshot by David Guirtz/ZDNET

I am happy to see that changes were quite surgical. Codex did not try to re -write the large sections of the plugin; This only modified small areas that required improvement.

In some areas, it dug and some other rows changed, but those changes were still very specific to the original signal.

At one point, I was eager to find out why it added a new forech loop to recur on an array, so I asked.

Added question — Screenshot by David Guirtz/ZDNET

As you can see above, I received a very clear response to its argument. This understood this, so I proceeded, continued to review the proposed changes of Codex.

All said, Codex proposed to make changes in nine separate files. Once I was satisfied with the changes, I clicked on Create PR. This makes a bridge request, which is that any github suggests changes in the user codebase. Once the PR is made, the project owner (in this case, in this case) has the option to approve the changes that add them to the actual code.

It is a good mechanism, and the Codex does a clean task of working within the Github environment.

PR -Anurodh -1 — Screenshot by David Guirtz/ZDNET

Once I was convinced that the changes were good, I returned the work of Codex to the main codebase.

Merge — Screenshot by David Guirtz/ZDNET

Houston we have a problem

I brought a change in my testing machine from Github and tried to run the planned plugin now. wait for it…

Yes. It’s not like that. To be fair, I have generated my part of the error screen like this, so I can’t really get angry at AI.

Instead, I took a screenshot of error and sent it to Codex, as well as with Codex to a prompt, “selective material plugin now fails after making changes you suggested. There are errors here.”

It took AI three minutes to suggest a fix, which it presented to me in a new difference.

Diff -3 — Screenshot by David Guirtz/ZDNET

I merged that change into the codebase, once again brought it to my test server, and did it. Avoid the crisis.

No vibe, no flow

When I am not in a crowd and I have time, coding can provide a very pleasant state of mind. I come in a type of flow with language, machine, and a relationship between my fingers and CPUs of the computer. Not only is it very funny, but it can also be emotionally transit.

It was not fun to work with Chatgpt Codex. It was not disgusting. It was not just fun. It felt more felt more than a meeting of the brain with a coding friend than an exchanging email with a particularly renovation contractor.

Also: How to use GPT-5 in vs vs with Github Copilot

Codex provided its reactions in about 10 or 15 minutes, while the same code may have taken me a few hours.

Would I have made bugs like Codex? Probably not. As part of the thinking process through that algorithm, I am most likely that the mistake made by Codex will be saved. But I would undoubtedly have made something else on the basis of existence or syntax errors.

To be fair, I had introduced a bug like Codex, it must have taken me more than three minutes to find and fix it. Add another hour either at least.

So Codex worked, but I was not in the flow. Generally, when I code and I am inside a particular file or sub -cast, I do a lot of work in that field. It is like cleaning day. If you are cleaning a part of the bathroom, you can also clean it.

But Codex clearly works best with small, simple instructions. Give it a square of change, and work through that one change before introducing new factors. As I said, it works and it is a useful tool. But it was definitely felt to use that it works more normally than programming, even though it saved me a lot of time.

Also: Google’s Jules AI coding agent created a new feature that I could actually ship – while I made coffee

I do not have the results of tangible testing, but after testing Google’s Jules and now the Codex of the chat in May, I feel that the joles are able to get a deep understanding of the code. At this point, I really cannot support that claim with lots of data; This is just an impression.

I am going to try to run another project through Jules. It will be interesting to see if Codex changes a lot after Openai feels safe to include GPT-5. Let us keep in mind that Openai eats his own dog food with Codex, which means that he uses Codex to create his code. He must have seen the same IFFY results found in his tests. They will be waiting until GPT-5 baked for a while.

Have you tried to use AI coding tools such as Chatgpt Codex or Google Jules in your development workflow? What kind of tasks did you throw on them? How well did he perform? Do you felt that the process helped you to work more efficiently? Did it slow you down and thrown you out of your coding flow?

Do you like to give your tool a small, surgical job, or are you looking for an agent who can handle large-digestion architecture and reconings? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to My weekly update newsletterAnd follow me on Twitter/X @DavidgewirtzOn Facebook Facebook.com/davidgewirtzOn Instagram Instagram.com/davidgewirtzOn blue @Davidgewirtz.comAnd on youtube Youtube.com/davidgewirtztv,

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

Advertisements are coming on Chatgpt. Here’s how they’ll work

I’ve tested Gemini, ChatGPT, Copilot, and others – Lenovo has all the AI assistants to beat

The price of these fashion-forward headphones looks so good it’s no business

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

Google tests AI-operated audio overview in search results for some questions

Our Picks