Imagine you’re doing market research to advise a soda company on which flavor to emphasize in its next advertising campaign: regular cola or cherry? You might aim to find and survey a wide variety of potential customers online. But now there’s a quicker, cheaper alternative: Polling mainstays like Qualtrics and newer firms like Synthetic Users are proffering “silicon” respondents—large language models (LLMs) that will pretend to be lots of different people and answer questions based on how the models predict they would respond. How does a 50-year-old white man from Oregon feel about cherry cola? A 25-year-old Black woman from Texas? The AIs can be them and hundreds more!
The consequences of picking the wrong flavor may be trivial, but suppose you’re asking whether a city should mandate police body cameras or Congress should offer tax breaks for private school tuition—or, as some are experimenting with, gauging preferences for an upcoming election.
Public polls are “enormously important,” says James Bisbee, a political scientist at Vanderbilt University. “Our political elites, our politicians, our leaders rely on them deeply to understand the public mood and to modify their own actions in office and their own policies.” And synthetic respondents may soon be a regular part of polling. “Today, humans fill out the surveys and computers fill in the gaps,” a Harvard political science panel wrote in 2024. “In the future, it will be the opposite.”
Academics, too, are experimenting with using AI survey respondents. It’s impossible to say how many, says Jamie Cummins, a visiting researcher at the University of Oxford who studies the impact of AI on research quality, but “the number of researchers exploring their use is growing across different scientific fields.” Studies from a few years ago suggested LLMs could pretty accurately reflect the diversity of human viewpoints, but more recent results are less bullish—and raise concerns that swapping AI for humans could have unintended consequences.
As Bisbee showed in a 2023 paper, AI survey data can accurately depict the average opinions of Democrats and Republicans, but not the full range of their views or how they vary by demographic. For example, it will show that Black respondents are more likely than white ones to be Democrats, but it can’t capture the views of outliers.
Silicon respondents yield more polarized results, too, Bisbee says, because the models are typically trained on data scraped from the internet, which is “dominated by partisanship.” America is polarized, he says, but LLMs “paint such an extreme version.”
And there are bias issues. Cornell AI researcher Angelina Wang found that bot respondents not only portray marginalized groups inaccurately, but represent them as viewed by outsiders: “It’s like if a bunch of white people got together and made a movie about Black people and only had white writers and even white actors.”
None of the abovementioned firms responded to my requests for comment, but in a 2023 interview with Science, the co-founder of Synthetic Users boasted that the firm’s data was “infinitely richer” than “bland” feedback usually received. Ordinary survey data is hardly perfect, to be sure. With each year, it gets harder and more expensive to accurately poll human subjects, Bisbee notes—especially online, where researchers must filter out bots and account for hard-to-reach groups. But “synthetic data is weird in ways that we don’t understand,” he says. Bisbee also has ethical concerns. “It is just impossible for me to feel comfortable studying humans using a simulated human,” he adds. “There is something first-order wrong about that.”
It gets worse. Last year, Sean Westwood, a professor of government at Dartmouth, published a startling paper, “The potential existential threat of large language models to online survey research.” He was able to design autonomous AI bots that eluded the tools researchers use to flag nonhuman respondents. Political operatives and foreign governments, Westwood concluded, could viably do the same to alter polling results and shape public opinion.
In January, Westwood told me the threat is no longer “potential.” In a recent experiment on a major survey platform, he found that at least 4 percent of survey responses were nonhuman. That “may sound really small,” he says, “but in a world where things that you’re measuring are very, very close, when the public is incredibly divided, that is enough to flip the result.”
Surveys from respected platforms like YouGov, Pew Research Center, and major news outlets are likely still AI-free, he says. That’s because they “recruit people in real life,” explains Courtney Kennedy, the data methods expert who oversees Pew’s survey design team—which builds national samples using randomized home addresses.
None of the researchers I interviewed are anti-AI. Several use it daily for tasks such as coding or annotating social media posts. But Bisbee draws the line at polling. He and his co-authors wrote that their findings “raise serious concerns about the quality, reliability, and reproducibility of synthetic survey data generated by LLMs.”
Indeed, the journal Psychological Science now requires researchers to disclose AI usage in data and surveys. “Public opinion,” Westwood emphasizes, “is a core component of democratic accountability.” This is why it needs to be measured with integrity. It’s also why, he says, people might want to take that extra minute to read the fine print and see whether the survey they’re reading, or reading about, reflects real humans—or subpar facsimiles.
