Our story begins, as many stories do, with a man and his AI. The man, like many men, is a bit of a geek and a bit of a programmer. He also needs a haircut.
The AI is the culmination of thousands of years of human advancement, all put to the service of making the man’s life a little easier. The man, of course, is me. I’m that guy.
Also: The best AI for coding in 2025 (and what not to use)
Unfortunately, while AI can be incredibly brilliant, it also has a propensity to lie, mislead, and make shockingly stupid mistakes. It is the stupid part that we will be discussing in this article.
Anecdotal evidence does have value. My reports on how I’ve solved some problems quickly with AI are real. The programs I used AI to write with are still in use. I have used AI to help speed up aspects of my programming flow, especially when I focus on the sweet spots where I’m less productive and the AI is quite knowledgeable, like writing functions that call publicly published APIs.
Also: I’m an AI tools expert, and these are the only two I pay for (plus three I’m considering)
You know how we got here. Generative AI burst onto the scene at the cusp of 2023 and has been blasting its way into knowledge work ever since.
One area, as the narrative goes, where AI truly shines is its ability to write code and help manage IT systems. Those claims are not untrue. I have shown, several times, how AI has solved coding and systems engineering problems I have personally experienced.
AI coding in the real world: What science reveals
New tools always come with big promises. But do they deliver in real-world settings?
Most of my reporting on programming effectiveness has been based on personal anecdotal evidence: my own programming experiences using AI. But I’m one guy. I have limited time to devote to programming and, like every programmer, I have certain areas where I spend most of my coding time.
Also: I tested 10 AI content detectors – and these 5 correctly identified AI text every time
Recently, though, a nonprofit research organization called METR (Model Evaluation & Threat Research) did a more thorough analysis of AI coding productivity.
Their methodology seems sound. They worked with 16 experienced open-source developers who have actively contributed to large, popular repositories. The METR analysts provided those developers with 246 issues from the repositories that needed fixing. The coders were given about half the issues where they had to work on their own, and about half where they could use an AI for help.
The results were striking and unexpected. While the developers themselves estimated that AI assistance increased their productivity by an average of 24%, METR’s analytics showed instead that AI assistance slowed them down by an average of 19%.
That’s a bit of a head-scratcher. METR put together a list of factors that might explain the slowdown, including over-optimism about AI usefulness, high-developer familiarity with their repositories (and less AI knowledge), the complexity of large repositories, lack of AI reliability, and an ongoing problem where the AI refuses to use “important tacit knowledge or context.”
Also: How AI coding agents could destroy open-source software
I would suggest that two other factors might have limited effectiveness:
Choice of problem: The developers were told which issues they had to use AI help on and which issues they couldn’t. My experience suggests knowledgeable developers must choose where to use AI based on the problem that needs to be solved. In my case, for example, getting the AI to write a regular expression (something I don’t like doing and I’m fairly crappy at) would save me a lot more time than getting the AI to modify unique code I’ve already written, work on regularly, and know inside and out.
Choice of AI: According to the report, the developers used Cursor, an AI-centric fork of VS Code, which used Claude 3.5/3.7 Sonnet at the time. When I tested 3.5 Sonnet, the results were terrible, with Sonnet failing three out of four of my tests. Subsequently, my tests of Claude 4 Sonnet were considerably better. METR reported that developers rejected more than 65% of the code the AI generated. That’s going to take time.
That time when ChatGPT suggested nuking my system
METRs results are interesting. AI is clearly a double-edged sword when it comes to coding help. But there’s also no doubt that AI can provide considerable value to coders. If anything, I think this test once again proves the contention that AI is a great tool for experienced programmers, but a potential high-risk resource for newbies.
Also: Why I’m switching to VS Code. Hint: It’s all about AI tool integration
Let’s look at a concrete example, one that could have cost me a lot of time and trouble if I followed ChatGPT’s advice.
I was setting up a Docker container on my home lab using Portainer (a tool that helps manage Docker containers). For some reason, Portainer would not enable the Deploy button to create the container.
It had been a long day, so I didn’t see the obvious problem. Instead, I asked ChatGPT. I fed ChatGPT screenshots of the configuration, as well as my Docker configuration file.
ChatGPT recommended that I uninstall and reinstall Portainer. It also suggested I remove Docker from the Linux distro and use the package manager to reinstall it. These actions would have had the effect of killing all my containers.
Of note, ChatGPT didn’t recommend or ask if I had backups of the containers. It just gave me the command line sequences it recommended I cut and paste to delete and rebuild Portainer and Docker. It was a wildly destructive and irresponsible recommendation.
The irony is that ChatGPT never figured out why Portainer wouldn’t let me deploy the new container, but I did. It turns out I never filled out the container’s name field. That’s it.
Also: What is AI vibe coding? It’s all the rage but it’s not for everyone – here’s why
Because I’m fairly experienced, I hesitated when ChatGPT told me to nuke my installation. However, someone relying on the AI for advice could have potentially brought down an entire server for want of typing in a container name.
Overconfident and underinformed AIs: A dangerous combo
I’ve also experienced the AI going completely off the rails. I’ve experienced it giving advice that was not only completely useless, but also presented with the apparent confidence of an expert.
Also: Google’s Jules AI coding agent built a new feature I could actually ship – while I made coffee
If you’re going to use AI tools to support your development or IT work, these tips might keep you out of trouble:
- If there’s not much publicly available information, the AI can’t help. But the AI will make stuff up based on what little it knows, without admitting that it is lacking experience.
- Like my dog, once the AI gets fixated on one thing, it often refuses to look at alternatives. If the AI is stuck on one approach, don’t make the mistake of believing that its polite recommendations about a new approach are real. It’s still going down the same rabbit hole. Start a new session.
- If you don’t know a lot, don’t rely on the AI. Keep up your learning. Experienced devs can tell the difference between what will work and what won’t. But if you’re trying to put all the coding on the back of the AI, you won’t know when or where it goes wrong or how to fix it.
- Coders often use specific tools for specific tasks. A website might be built using Python, CSS, HTML, JavaScript, Flask, and Jinja. You choose each tool because you know what it does well. Choose your AI tools the same way. For example, I don’t use AI for business logic, but I gain productivity using AI to write API calls and public knowledge, where it can save me a lot of time.
- Test everything an AI produces. Everything. Line by individual line. The AI can save a ton of time, but it can also make enormous mistakes. Yes, taking the time and energy to test by hand can help prevent errors. If the AI offers to write unit tests, let it. But test the tests.
Based on your experience level, here’s how I recommend you think about AI assistance:
- If you know nothing about a subject or skill: AI can help you pass as if you do, but it could be amazingly wrong, and you might not know.
- If you’re an expert in a subject or skill: AI can help, but it will piss you off. Your expertise gets used not only to separate the AI-stupid from the AI-useful, but to carefully craft a path where AI can actually help.
- If you’re in between: AI is a mixed bag. It could help you or get you in trouble. Don’t delegate your skill-building to the AI because it could leave you behind.
Also: How I used ChatGPT to analyze, debug, and rewrite a broken plugin from scratch – in an hour
Generative AI can be an excellent helper for experienced developers and IT pros, especially when used for targeted, well-understood tasks. But its confidence can be deceptive and dangerous.
AI can be useful, but always double-check its work.
Have you used AI tools like ChatGPT or Claude to help with your development or IT work? Did they speed things up, or nearly blow things up? Are you more confident or more cautious when using AI on critical systems? Have you found specific use cases where AI really shines, or where it fails hilariously? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.