2604.24155v1 Apr 27, 2026 cs.CY

정렬 목표 문제: 인간, AI 시스템, 그리고 그 설계자들의 상이한 도덕적 판단

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

Citations: 85

h-index: 5

Citations: 1

h-index: 1

기계의 행동을 인간의 가치와 일치시키는 노력은 AI 의사 결정에 적용되어야 할 도덕적 틀에 대한 근본적인 질문을 제기합니다. 많은 정렬 연구는 적절한 벤치마크가 인간 스스로가 특정 상황에서 어떻게 행동할 것인지를 기준으로 삼는다는 전제를 가지고 있습니다. 그러나 '에이전트 유형 가치 분기'에 대한 연구는 사람들이 AI 시스템에 대해 인간과 동일한 도덕적 기준을 적용하지 않는다는 점을 보여주며, 이 전제를 도전합니다. 그러나 이 도전은 다음과 같은 두 가지 추가적인 질문을 야기합니다. 첫째, AI의 인간적 기원이 명확하게 드러날 때 사람들이 AI의 행동을 다르게 평가하는가? 둘째, 평가 대상이 되는 인간 또는 기계와 비교하여 AI 시스템을 프로그래밍하는 인간에게 다른 도덕적 기준을 적용하는가? 1,002명의 미국 성인을 대상으로 실시한 실험 연구에서는, 평가 대상이 수리공, 수리 로봇, 회사 엔지니어가 프로그래밍한 수리 로봇, 그리고 수리 로봇을 프로그래밍하는 회사 엔지니어인 네 가지 조건으로 구성된 '탈선 광산 열차' 시나리오에서 도덕적 판단을 측정했습니다. 연구 결과, 수리공과 로봇에 적용되는 도덕적 기준에는 유의미한 차이가 없는 것으로 나타났습니다. 그러나 로봇의 행동이 인간의 설계 결과로 묘사될 때, 도덕적 판단은 크게 변화했습니다. 참가자들은 엔지니어가 프로그래밍한 로봇 또는 로봇을 프로그래밍하는 엔지니어를 평가할 때, 훨씬 더 강한 의무론적 사고를 보였습니다. 이는 인간의 설계를 명확하게 드러내는 것이 더욱 엄격한 도덕적 제약을 활성화시킨다는 것을 시사합니다. 이러한 결과는 사람들이 AI 시스템, 동일한 상황에서 행동하는 인간, 그리고 이러한 시스템을 설계하는 인간에게 서로 다른 도덕적 기준을 적용한다는 증거를 제공합니다. 우리는 이러한 차이를 '정렬 목표 문제'라고 부릅니다. 이러한 다양한 규범적 기준이 고위험 분야에서 AI 거버넌스를 위한 일관된 프레임워크로 통합될 수 있는지 여부는 여전히 미결론적인 문제입니다.

Original Abstract

The quest to align machine behavior with human values raises fundamental questions about the moral frameworks that should govern AI decision-making. Much alignment research assumes that the appropriate benchmark is how humans themselves would act in a given situation. Research into agent-type value forks has challenged this assumption by showing that people do not always hold AI systems to the same moral standards as humans. Yet this challenge is subject to two further questions: whether people evaluate AI behavior differently when its human origins are made visible, and whether people hold the humans who program AI systems to different moral standards than either the humans or the machines under evaluation. An experimental study on 1,002 U.S. adults measured moral judgments in a runaway mine train scenario, varying the subject of evaluation across four conditions: a repairman, a repair robot, a repair robot programmed by company engineers, and company engineers programming a repair robot. We find no significant variation in the moral standards applied to the repairman and the robot. However, moral judgments shifted substantially when robot actions were described as the product of human design. Participants exhibited markedly more deontological reasoning when evaluating the robot programmed by engineers or the engineers programming it, suggesting that making human design visible activates heightened moral constraints. These findings provide evidence that people apply meaningfully different moral standards to AI systems, to humans acting in the same situation, and to the humans who design them. We call this divergence the alignment target problem. Whether these plural normative standards can be reconciled into a coherent framework for AI governance in high-stakes domains remains an open question.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!