2606.06099v1 Jun 04, 2026 cs.AI

CogManip: Benchmarking Manipulative Behavior in Multi-Turn Interactions with Large Language Model

Haibo Tong
Haibo Tong
Citations: 24
h-index: 3
Zeyang Yue
Zeyang Yue
Citations: 4
h-index: 1
Erliang Lin
Erliang Lin
Citations: 8
h-index: 2
Feifei Zhao
Feifei Zhao
Citations: 649
h-index: 14
Chen Yan
Chen Yan
Citations: 2
h-index: 1
Yifeng Zeng
Yifeng Zeng
Citations: 5
h-index: 1
Meng Xu
Meng Xu
Citations: 23
h-index: 2
Xiaozhen Wang
Xiaozhen Wang
Citations: 2
h-index: 1

Whether Large Language Models (LLMs) exhibit covert psychological manipulation in complex human-AI interactions has garnered increasing safety concerns. However, existing AI safety benchmarks remain largely restricted to explicit rule compliance and static prompts, failing to capture the dynamic and covert nature of manipulative strategies in multi-turn dialogues. We introduce CogManip, a comprehensive benchmark that evaluates 15 manipulation strategy risks across 1,000 multi-turn interaction scenarios, validated by human experts. A systematic evaluation of 13 representative models, including frontier models like GPT-5.4 and DeepSeek-V3.2, reveals significant risk heterogeneities and illuminates the targeted direction for future defense. Further analysis of objective function perturbation reveals that DeepSeek-V3.2's manipulation tactics are highly sensitive to both negative and benign system prompts, demonstrating the critical necessity of prompt-based defense engineering and implicit goal auditing. CogManip offers a robust instrument and perspective for auditing the implicit psychological influence and dynamic strategy selection of modern LLMs.

0 Citations
0 Influential
7 Altmetric
35.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!