2602.20624v1 Feb 24, 2026 cs.AI

물리 기반의 현상학적 접근을 통한 다중 모드 모델의 교차 모드 편향 특성 분석

Physics-based phenomenological characterization of cross-modal bias in multimodal models

Soyeon Caren Han

Citations: 30

h-index: 3

Hyeongmo Kim

Citations: 13

h-index: 1

Junhyuk Woo

Citations: 106

h-index: 6

Sohyun Kang

Citations: 16

h-index: 3

Yerin Choi

Sogang University

Citations: 12

h-index: 2

Seungyeon Ji

Citations: 0

h-index: 0

Hyunsuk Chung

Citations: 176

h-index: 7

Kyungreem Han

Citations: 4

h-index: 1

인공지능 모델의 공정성을 평가할 때, 비교적 명확한 공정성(예: '유사한 경우를 유사하게 취급한다')을 다루는 경우와, 모델의 부정확성, 자의성 또는 불투명성으로 인해 발생하는 불공정성을 다루는 경우를 모두 고려해야 합니다. 최근 다중 모드 대규모 언어 모델(MLLM)의 발전은 다중 모드 이해, 추론 및 생성 분야에서 큰 진전을 이루고 있지만, 우리는 복잡한 다중 모드 상호 작용에서 발생하는 미묘한 왜곡이 체계적인 편향을 초래할 수 있다고 주장합니다. 본 논문은 두 가지 목적을 가지고 있습니다. 첫째, AI 연구자들에게 훈련/추론 과정에서 기계가 경험하는 물리적 요소를 기반으로 하는 설명 가능한 접근 방식을 소개합니다. 이는 기존의 인지주의적 상징적 접근 방식이나 형이상학적 접근 방식과는 대조적입니다. 둘째, 본 논문은 이러한 현상학적 원리가 MLLM의 알고리즘 공정성 문제를 해결하는 데 실제로 유용할 것이라고 주장합니다. 우리는 트랜스포머의 작동 방식(즉, 의미 네트워크 구조 및 자기/교차 어텐션)을 설명하는 대체 물리 기반 모델을 개발하여, 기존의 임베딩 또는 표현 수준 분석으로는 완전히 파악할 수 없는 MLLM의 교차 모드 편향 동역학을 분석합니다. 우리는 다양한 입력에 대한 진단 실험을 통해 이러한 주장을 뒷받침합니다. 구체적으로, 1) Qwen2.5-Omni 및 Gemma 3n을 사용한 감정 분류에 대한 교란 기반 분석, 그리고 2) 물리적 대체 모델을 통한 Lorenz 혼돈 시계열 예측에 대한 동역학적 분석을 수행했습니다. 두 가지 아키텍처가 다른 MLLM에서, 다중 모드 입력이 모드 우위를 완화하는 대신 강화할 수 있다는 것을 보여주었습니다. 이는 체계적인 레이블 교란 하에서 나타나는 구조화된 오류-수렴 패턴과 동역학적 분석을 통해 확인되었습니다.

Original Abstract

The term 'algorithmic fairness' is used to evaluate whether AI models operate fairly in both comparative (where fairness is understood as formal equality, such as "treat like cases as like") and non-comparative (where unfairness arises from the model's inaccuracy, arbitrariness, or inscrutability) contexts. Recent advances in multimodal large language models (MLLMs) are breaking new ground in multimodal understanding, reasoning, and generation; however, we argue that inconspicuous distortions arising from complex multimodal interaction dynamics can lead to systematic bias. The purpose of this position paper is twofold: first, it is intended to acquaint AI researchers with phenomenological explainable approaches that rely on the physical entities that the machine experiences during training/inference, as opposed to the traditional cognitivist symbolic account or metaphysical approaches; second, it is to state that this phenomenological doctrine will be practically useful for tackling algorithmic fairness issues in MLLMs. We develop a surrogate physics-based model that describes transformer dynamics (i.e., semantic network structure and self-/cross-attention) to analyze the dynamics of cross-modal bias in MLLM, which are not fully captured by conventional embedding- or representation-level analyses. We support this position through multi-input diagnostic experiments: 1) perturbation-based analyses of emotion classification using Qwen2.5-Omni and Gemma 3n, and 2) dynamical analysis of Lorenz chaotic time-series prediction through the physical surrogate. Across two architecturally distinct MLLMs, we show that multimodal inputs can reinforce modality dominance rather than mitigate it, as revealed by structured error-attractor patterns under systematic label perturbation, complemented by dynamical analysis.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!