Researchers claim their AI model simulates the human mind

By training a large language model (LLM) on a vast data set of human behavior, researchers say they have built an artificial intelligence (AI) system that can mimic a human mind. In a paper published today in Nature, they report that their model, Centaur, can “predict and simulate” human behavior in any experiment that can be written out in natural language.
July 9, 2025
image_print

Cognitive scientists question new Centaur model’s ability to predict human behavior.

By training a large language model (LLM) on a vast data set of human behavior, researchers say they have built an artificial intelligence (AI) system that can mimic a human mind. In a paper published today in Nature, they report that their model, Centaur, can “predict and simulate” human behavior in any experiment that can be written out in natural language.

But other scientists raise their eyebrows at the claim. “I think there’s going to be a big portion of the scientific community that will view this paper very skeptically and be very harsh on it,” says Blake Richards, a computational neuroscientist at McGill University and Mila – Quebec Artificial Intelligence Institute. He and others say the model doesn’t meaningfully mimic human cognitive processes, and that it can’t be trusted to produce results that would match human behavior.

Cognitive scientists often build models to help them understand the systems underlying abilities such as vision and memory. Each of these models captures only a very small, isolated part of human cognition, says Marcel Binz, a cognitive scientist at the Institute for Human-Centered AI at Helmholtz Munich. But with recent advances in LLMs, “we suddenly got this new exciting set of tools,” that might be used to understand the mind as a whole, he says.

To develop such a model, Binz and his colleagues created a data set called Psych-101, which contained data from 160 previously published psychology experiments, covering more than 60,000 participants who made more than 10 million choices in total. For example, in two “two-armed bandit” experiments, participants had to repeatedly choose between two virtual slot machines rigged to have unknown or changing probabilities of paying out.

The researchers then trained Llama, an LLM produced by Meta, by feeding it the information about the decisions participants faced in each experiment, and the choices they made. They called the resulting model “Centaur”—the closest mythical beast they could find to something half-llama, half-human, Binz says.

For each experiment, they used 90% of the human data to train the model and then tested whether its output matched the remaining 10%. Across experiments, they found Centaur aligned with the human data more closely than did more task-specific cognitive models. When it came to the two-armed bandit decisions, for example, the model produced data that looked more like the slot machine choices made by participants than a model specifically designed to capture how people make decisions in this task.

Centaur also produced humanlike outputs on modified tasks that weren’t in its training data, such as a version of the two-armed bandit experiment that adds a third slot machine. That means researchers could use Centaur to develop experiments “in silico” before taking them to human participants, Binz says, or to develop new theories of human behavior.

But Jeffrey Bowers, a cognitive scientist at the University of Bristol, thinks the model is “absurd.” He and his colleagues tested Centaur—which Binz’s team had made public when it published a first draft of the paper as a preprint—and found decidedly un-humanlike behavior. In tests of short-term memory, it could recall up to 256 digits, whereas humans can commonly remember approximately seven. In a test of reaction time, the model could be prompted to respond in “superhuman” times of 1 millisecond, Bowers says. This means the model can’t be trusted to generalize beyond its training data, he concludes.

More important, Bowers says, is that Centaur can’t explain anything about human cognition. Much like an analog and digital clock can agree on the time but have vastly different internal processes, Centaur can give humanlike outputs but relies on mechanisms that are nothing like those of a human mind, he says.

Federico Adolfi, a computational cognitive scientist at the Max Planck Society’s Ernst Strüngmann Institute for Neuroscience, agrees. Further stringent tests are likely to show that the model is “very easy to break,” he says. And he points out that although the Psych-101 data set is impressively large, 160 experiments is “a grain of sand in the infinite pool of cognition.”

But others see some value in the paper. Rachel Heaton, a vision scientist at the University of Illinois Urbana-Champaign, says the model doesn’t offer useful tools for understanding human cognition, but thinks the Psych-101 data set is a useful contribution in its own right because other researchers can use it to test the success of their models. Richards says future studies to understand what’s going on under the hood of Centaur could also be valuable.

Many computational neuroscientists are “cautiously excited” about new tools like Centaur, adds Katherine Storrs, a computational visual neuroscientist at the University of Auckland. The paper makes some unjustified sweeping claims, she says, but a lot of time and effort has gone into the data set and model, and the work “may end up paying off scientifically in the long run.”

 

Source: https://www.science.org/content/article/researchers-claim-their-ai-model-simulates-human-mind-others-are-skeptical