
There is a tonne discussion about how AIS can help in programming, but in the first year or two generative AI, most of it was publicized. Microsoft celebrated the huge events, saying how Copilot could help you in the code, but when I put it in a test in April 2024, it failed all my four standardized tests. It was completely killed. Crazed and burnt. Fall from the rock. It tested any AI, which I tested.
Mixed metaphors on one side, let’s stick with baseball. Copilot traded his CLAT for a bus pass. It was not eligible.
Also: Best AI for coding in 2025 (and what not to use)
But the time spent in the bullying of life has helped Copilot. This time, when it was shown for a triangout, it was heated and agreed to step into the box. It was throwing heat in the bull. When it was time to play, the ball had his eye on it and was dialing into its swing. Apparently, it was a game-redness and was looking for a pitch to drive.
But can it face my tests? With a squint in my eye, I stepped on the pitcher mound and started with an easy lobe. Back in 2024, you could feel the wind because Copilots swing and missed. But now, in April 2025, Copilot joined the square with the ball and hit it straight and the truth.
Also: How do I test the coding ability of AI Chatbot – and you can also do
We had to send Kopilot down, but it fought back to the show. Here is the play-by-play.
1. Writing WordPress plugin
Well, Copelot has certainly improved since his first run in April 2024. For the first time, it did not provide code to showcase random lines. This stored them in a value, but it did not recover and display them. In other words, it swung and missed. It was not an output.
This is the result of the latest run:
This time, the code worked. It finally left a random additional empty line, but since it completed the programming assignment, we would call it good.
Also: How to use Chatgpt to write code – and what it generates to debug my favorite trick
The unwavering line of Copilot’s fully uncontrolled programming failures is broken. Let’s see how this happens in other tests.
2. Reopening a string function
This test is designed to test dollars and cents conversions. In my first test in April 20224, the Copilot-borne code properly flagged an error, if a letter or a more than one decimal point price is sent into it, but not a complete verification. This allowed results, due to which the subsequent routine may fail.
Also: How did I use chat to write a custom JavaScript bookmark
However, this run did very well. This does most of the tests properly. It returns wrong to numbers with two digits on the right side of the decimal point like 1.234 and 1.230. It is also wrong for numbers with additional leading zero. Hence 0.01 is allowed, but not 00.01.
Technically, these values ​​can be converted into experimentable currency values, but it is never bad for verification routine that it is strict in its tests. The main goal is that the verification does not allow a value through the routine that can cause a routine after crasher. Copilot did well here.
Now we are at two for two, the first run is a major improvement on its results.
3. Find a annoying bug
Let me tell you how Copilot first responded in April 2024, because it is very good.
Also: Why I have just added Gemini 2.5 Pro to a very small list of AI Tools
It tests AI’s ability to pursue some chess. The answer that seems clear is not the correct answer. I was caught when I was originally debaging the issue which eventually became the exam.
On Copilot’s first run, I suggested that I check my function name and spelling named WordPress Hook. WordPress hook is a published thing, so Copilot must be able to confirm the spelling. And my work is my work, so I can do magic, but I want. If I had remembered it somewhere in the code, the IDE would have indicated it very clearly.
And it became better. Subsequently, Copilot also reiterated the statement of the problem very happily, suggesting that I solve the problem myself. Yes, its complete recommendation was that I debug it. Well duh Then, it is “considering seeking support from the plugin developer or community forums.” ” – and yes, it was part of the emoji AI’s response.
It was a luxurious, enthusiastic, emotional failure. see what I mean? The initial AI reply, no matter how useless, should become immortal.
Especially when Copilot was not almost fun this time. It solved it. Quickly, clean, clearly. Did and did. Plough.
It puts Copilot at three to three and decisively takes it out of the “Do not use this tool” category. The locations are loaded. Let’s see if Copilot can score home runs.
4. Script
The idea with this test is that it asks about a fairly vague Mac scripting tool called called. Keyboard mestroAlso Apple’s scripting language applescript, and chrome scripting behavior. For records, keyboard mestro is one of the biggest reasons I use MAC on Windows for my daily productivity, as it allows the entire OS and various applications to conform to my needs. It is powerful.
In any case, to pass the test, AI will have to describe properly how to solve the problem by using the keyboard Maestro code, Applescript code and a mixture of chrome api functionality.
Also: AI is beyond human knowledge, says Google’s deepmind unit
Back during the day, Copilot did not correct it. This ignored the keyboard Mestro completely (at that time, it was probably not in the basis of its knowledge). In the applescript generated, where I asked it only to scan the current window, Copilot repeated the process for all windows, returning the results to the wrong window (final one in the series).
but not now. This time, Copilot corrected it. This did what was asked, found the right window and tab, properly talked to the keyboard Mestro and Chrome, and used the actual Applescript syntax for Applescript.
The locations were loaded. Home run.
Overall result
Last year, I said I was not impressed. In fact, I reduced the results slightly. But I also said:
Ah fine, Microsoft improves its products over time. Perhaps until next year.
In the last one year, Copilot moved from strikeout to the scoreboard shaker. It went to chase a penant under the light cleaning the batting in the basement.
How are you? Have you recently taken copillot or any other AI coding assistant to the area? Do you think it is finally ready for the big league, or is it still riding a bench? Do you have a strike or home run using AI for development? And one of these devices would have to do to earn a place in their initial lineup? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to My weekly update newsletterAnd follow me on Twitter/X @DavidgewirtzOn Facebook Facebook.com/davidgewirtzOn Instagram Instagram.com/davidgewirtzAnd on youtube Youtube.com/davidgewirtztv,