2601.12754v1 Jan 19, 2026 cs.HC

PAIR-SAFE: 짝 에이전트 기반 실시간 감사 및 개선을 통한 AI 기반 정신 건강 지원 시스템

PAIR-SAFE: A Paired-Agent Approach for Runtime Auditing and Refining AI-Mediated Mental Health Support

Violeta J. Rodriguez

Citations: 45

h-index: 4

Dong Whi Yoo

Citations: 51

h-index: 4

Koustuv Saha

Citations: 351

h-index: 10

Jiwon Kim

Citations: 6

h-index: 1

Eshwar Chandrasekharan

Citations: 782

h-index: 13

대규모 언어 모델(LLM)은 정신 건강 지원에 점점 더 많이 활용되고 있지만, 특히 민감하거나 고위험 상황에서 지나치게 지시적이거나 일관성이 없거나 임상적으로 부적절한 응답을 생성할 수 있습니다. 이러한 위험을 완화하기 위한 기존 방법은 주로 훈련 또는 프롬프팅을 통한 암묵적인 정렬에 의존하며, 이는 투명성과 실시간 책임성을 제한합니다. 본 연구에서는 PAIR-SAFE라는 짝 에이전트 기반 프레임워크를 소개합니다. 이 프레임워크는 AI가 생성한 정신 건강 지원 콘텐츠를 감사하고 개선하며, 임상적으로 검증된 동기 부여 면담 치료 통합(MITI-4) 프레임워크를 기반으로 하는 감독 에이전트(Judge)와 응답 에이전트(Responder)를 결합합니다. Judge 에이전트는 각 응답을 감사하고, 구조화된 '허용(ALLOW)' 또는 '수정(REVISE)' 결정을 내려 실시간 응답 개선을 안내합니다. 우리는 인간이 주석을 달은 동기 부여 면담 데이터를 기반으로 한 지원 요청자 시뮬레이터를 사용하여 상담 상호 작용을 시뮬레이션합니다. 연구 결과, Judge 에이전트의 감독 하에 진행된 상호 작용은 Partnership, Seek Collaboration, 전반적인 관계 품질을 포함한 주요 MITI 차원에서 상당한 개선을 보였습니다. 정량적 결과는 전문가의 질적 평가를 통해 뒷받침되며, 이는 실시간 감독의 미묘한 측면을 더욱 강조합니다. 종합적으로, 본 연구 결과는 이러한 짝 에이전트 접근 방식이 AI 기반 대화형 정신 건강 지원을 위한 임상적으로 타당한 감사 및 개선 기능을 제공할 수 있음을 보여줍니다.

Original Abstract

Large language models (LLMs) are increasingly used for mental health support, yet they can produce responses that are overly directive, inconsistent, or clinically misaligned, particularly in sensitive or high-risk contexts. Existing approaches to mitigating these risks largely rely on implicit alignment through training or prompting, offering limited transparency and runtime accountability. We introduce PAIR-SAFE, a paired-agent framework for auditing and refining AI-generated mental health support that integrates a Responder agent with a supervisory Judge agent grounded in the clinically validated Motivational Interviewing Treatment Integrity (MITI-4) framework. The Judgeaudits each response and provides structuredALLOW or REVISE decisions that guide runtime response refinement. We simulate counseling interactions using a support-seeker simulator derived from human-annotated motivational interviewing data. We find that Judge-supervised interactions show significant improvements in key MITI dimensions, including Partnership, Seek Collaboration, and overall Relational quality. Our quantitative findings are supported by qualitative expert evaluation, which further highlights the nuances of runtime supervision. Together, our results reveal that such pairedagent approach can provide clinically grounded auditing and refinement for AI-assisted conversational mental health support.

1 Citations

0 Influential

6.5 Altmetric

33.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!