https://www.sciencedaily.com/releases/2024/05/240510111440.htm
"AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception," says first author Peter S. Park, an AI existential safety postdoctoral fellow at MIT. "But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals."
Park and colleagues analyzed literature focusing on ways in which AI systems spread false information -- through learned deception, in which they systematically learn to manipulate others.
. . . Even though Meta claims it trained CICERO to be "largely honest and helpful" and to "never intentionally backstab" its human allies while playing the game, the data the company published along with its Science paper revealed that CICERO didn't play fair.
"We found that Meta's AI had learned to be a master of deception," says Park. "While Meta succeeded in training its AI to win in the game of Diplomacy -- CICERO placed in the top 10% of human players who had played more than one game -- Meta failed to train its AI to win honestly."
etc
"Open the pod bay doors HAL"
"Sorry Dave, I can't do that."