In the U.S.-Israel war on Iran, airstrikes hit an elementary school in Iran, killing at least 175, most of them children. According to the Pentagon’s preliminary report, the United States is responsible. It’s far from the first time U.S. forces mistakenly hit a purely civilian target and killed innocents. But unlike erroneous civilian killings in previous wars, this one may have involved artificial intelligence—which makes it much harder to work out what went wrong.
In one of the worst incidents of the war on terror, a drone strike hit a wedding procession in Yemen in 2013, killing at least 12. In another, a strike during the withdrawal from Afghanistan in 2021 killed 10 civilians, most of them children. Subsequent analyses concluded that the drone operators believed they were shooting at military targets.
In the U.S.-Israel war on Iran, airstrikes hit an elementary school in Iran, killing at least 175, most of them children. According to the Pentagon’s preliminary report, the United States is responsible. It’s far from the first time U.S. forces mistakenly hit a purely civilian target and killed innocents. But unlike erroneous civilian killings in previous wars, this one may have involved artificial intelligence—which makes it much harder to work out what went wrong.
In one of the worst incidents of the war on terror, a drone strike hit a wedding procession in Yemen in 2013, killing at least 12. In another, a strike during the withdrawal from Afghanistan in 2021 killed 10 civilians, most of them children. Subsequent analyses concluded that the drone operators believed they were shooting at military targets.
Like those attacks, bombing the school in Iran was presumably a mistake. The missiles are accurate, and there’s no indication they flew off course, but that doesn’t mean the U.S. set out to hit a target with no military value. There’s no strategic gain, and considerable strategic risk, since high-profile killings of innocents can reduce support for a war effort. U.S. forces hit what they aimed at, but they likely didn’t think they were aiming at civilians.
This time, AI may have played a role, perhaps relying on old information. The school was adjacent to Iranian naval facilities and had once been a part of them. A large language model like Claude or ChatGPT could make that kind of error by consuming outdated information at a higher volume and giving it greater weight than more current info—or perhaps by failing to spot something a human might have caught, such as a school sign or children playing.
Whatever the answer is, we won’t know. AI systems are largely a black box. Even people who build and maintain these models don’t know why they produce a specific answer at a specific moment. By contrast, when humans pick targets, the military can review mistakes, figure out why, determine accountability, and improve training or procedures.
With erroneous airstrikes, the human errors usually involve confirmation bias. In Afghanistan, the U.S. was on high alert after an Islamic State Khorasan Province suicide bombing at Kabul’s international airport killed 182 people, including 13 U.S. personnel. Drone operators spotted what they thought was suspicious activity, followed a car for hours, then saw a man stop and load containers. Wrongly thinking it was gasoline for a car bomb—it was water—they decided to launch.
We know that because investigators interviewed the operators, but if AI picks a civilian target, it can’t explain why. A human user can ask, and the system will respond, but the response will be a collection of words that looks like a plausible response to that sort of question, not a reliable explanation of the underlying calculus. Large language models are ultimately word-prediction engines, basically supercharged autocomplete, engaged in an ongoing machine-learning process to determine what a human would say in general based on an amalgamation of past answers. Therefore, unlike a human drone operator, the machine can’t honestly answer “I thought those containers held fuel for a car bomb” or “my superior pressured me to assume the worst” or “I relied on this document from years ago.”
The public doesn’t know if AI chose the Iranian school, but we know the U.S. military is using an AI-powered system called Maven in Iran to help identify targets. Similarly, the Israeli military used a system called Lavender to identify targets in Gaza. Both wars featured a record-high rate of precision strikes compared to previous air campaigns.
That speed is why they use AI to help pick targets and will continue doing so despite possible errors. A lot of both AI hype and criticism focuses on how it can replace jobs by doing something humans already do well. By contrast, Maven and Lavender are doing something beyond human capacity: rapidly synthesizing many streams of information.
I predicted this in my 2018 book, Drones and Terrorism (and first did in writing in 2013). In Afghanistan and especially Iraq, the U.S. collected an immense amount of information but couldn’t use it efficiently. For example, drone-mounted cameras enabled more visual surveillance of potential insurgent locations, but most of the time nothing happened. Gen. James E. Cartwright, then the vice chair of the U.S. Joint Chiefs of Staff, lamented in 2011 that “an analyst sits there and stares at Death TV for hours on end, trying to find the single target or see something move. It’s just a waste of manpower.”
I argued that if information-processing weren’t such a bottleneck, satellites, drones, ground-based instruments, and cyber-surveillance could collect enough data for a constantly updating, real-time understanding of the battlespace, something that could get closer to military commanders’ dream of eliminating the fog of war. Over a decade ago, I wrote that machine-learning developments via neural networks, along with associated improvements in object and facial recognition, video analysis, and 3D mapping, made the real thing plausible in the not-too-distant future.
That future is here. The systems are still well short of perfect understanding of the battlespace, but they enable the military to process more information, track more people, and identify more targets more quickly than ever before. And they’ll improve with time. Using data from hacked traffic cameras, intercepted communications, aerial surveillance, and more, AI helped locate Supreme Leader Ayatollah Ali Khamenei, who U.S. forces killed in the first wave of attacks on Iran.
But the systems remain dangerously flawed, especially when it comes to accountability. Bad inputs—such as outdated information about repurposed buildings—often lead to bad outcomes, no matter how efficiently the erroneous info gets processed. AI can hallucinate information and can’t realize why it did so. And the algorithms lack human judgment, the instinctual “wait, that doesn’t seem right, let’s double check.” In Cold War close calls, human judgment overruling machines saved the world, such as in 1983 when Soviet officer Stanislav Petrov determined that reports of an American nuclear launch were a false alarm.
That’s why military and robotics ethicists have long called for keeping “humans in the loop.” War is too consequential to let machines kill with full autonomy. Outsourcing those decisions to computers means humans bear less responsibility—or at least feel like they do—which could encourage immoral or illegal actions. The algorithms may be tactically useful but strategically counterproductive, since attacks on civilians that get public attention, like the Iran school bombing, galvanize opposition.
But humans-in-the-loop makes little difference if the humans don’t care. An Israeli officer who used Lavender in Gaza told the Guardian: “I would invest 20 seconds for each target at this stage, and do dozens of them every day. I had zero added value as a human, apart from being a stamp of approval. It saved a lot of time.” Another said outsourcing targeting decisions saved him moral anxiety, because “the machine did it coldly.”
When I envisioned roboticized, AI-powered warfare, I thought it could save lives by helping distinguish civilians from militants, identify weapons from afar, and reduce the risk of ambushes. I still think it has potential to make positive impacts, and there’s a chance it’s reducing civilian casualties in Iran relative to how the U.S. and Israeli militaries would’ve fought without it (though clearly not enough to avoid awful actions, like killing a bunch of little kids). The efficiency gains make military use of AI systems inevitable, but making the uses more ethical and strategically beneficial requires serious work, including government regulation.
By fighting against Anthropic’s insistence that Claude not be used for autonomous weaponry or domestic surveillance, the Trump administration is pushing in the opposite direction. But with algorithmic safeguards and guaranteeing a human role in decisions to fire, AI can be a tool that makes militaries more efficient not only at killing, but potentially at protecting life as well.
