When artificial intelligence (AI) is allowed to behave more like a human communicator, it becomes a more effective debate partner that reaches more accurate conclusions, scientists have found.
Human communication is full of stops and starts, impassioned interruptions, unsure silences and ambiguity. AI, on the other hand, adheres to the formal communication style of computers — processing a command, formulating a response, delivering the output, and waiting patiently for the next command.
Sei and his co-workers proposed a framework where large language models (LLMs) didn’t have to adhere to the back-and-forth, wait-your-turn nature of computerized communication. Instead, an LLM could be assigned a personality that let it speak out of turn, cut off other speakers, or remain silent.
Beyond creating more humanlike methods of AI communication, the researchers found that such flexibility led to higher accuracy on complex tasks compared with that of standard LLMs.
A host of personalities
The team started by integrating traits into LLMs according to the “big five” personality types from classical psychology — openness, conscientiousness, extraversion, agreeableness and neuroticism.
The next step was to reprogram text-based LLMs to process responses sentence by sentence rather than generating a full response before the next one started, which allowed the researchers to carefully control the flow of discussion. They also compared the results between three conversational settings — fixed speaking order, dynamic speaking order, and dynamic speaking order with interruption enabled. The latter enabled the model to calculate an “urgency score” that let them grasp and process the conversation in real time.
The urgency score was expressed in the conversation in several ways. If it spiked because the model spotted an error or a point it considered critical to the discussion, it could raise this immediately, regardless of whose turn it was to speak. If the urgency score was low, the model interpreted this as having nothing concrete to add, which reduced conversational “clutter” for its own sake.
Sei told Live Science that the team evaluated performance using 1,000 questions from the Massive Multitask Language Understanding (MMLU) benchmark — an AI reasoning test encompassing questions from different areas, including science and humanities.
“When one agent initially gave an incorrect answer, overall accuracy was 68.7% with fixed-order discussion, 73.8% with dynamic order, and 79.2% when interruption was allowed,” Sei said. “In a more difficult setting where two agents initially gave incorrect answers, accuracy was 37.2% with fixed order, 43.7% with dynamic order, and 49.5% with interruption enabled.”
Having shown that the personality-driven models were more accurate than traditional AI chatbots, Sei now wants to explore how these new findings can be applied in practice. The team plans to apply their findings to various domains featuring creative collaboration to understand the dynamic around how “digital personalities” can play out in decision-making within a group.
“In the future, AI agents will increasingly interact with one another and with humans in collaborative settings,” said Sei. “Our findings suggest that discussions shaped by personality, including the ability to interrupt when necessary, may sometimes produce better outcomes than strictly turn-based and uniformly polite exchanges.”













