2602.09590v1 Feb 10, 2026 cs.CL

언어 모델의 성별 편향 완화를 위한 문맥 인지적 반사실 데이터 증강

Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models

S. Parihar

Citations: 176

h-index: 8

Guangliang Liu

Citations: 185

h-index: 9

Lu Cheng

Citations: 1

h-index: 1

Natalie Parde

Citations: 11

h-index: 2

미세 조정된 언어 모델(LM)에서 사회적 편향을 완화하는 데 있어 발생하는 어려움 중 하나는 언어 모델링 능력의 잠재적인 감소로, 이는 downstream 성능에 부정적인 영향을 미칠 수 있습니다. 반사실 데이터 증강(CDA)은 널리 사용되는 방법이지만, 실제 분포와 일치하지 않는 합성 데이터를 생성하거나, 사전 훈련 코퍼스에 있는 수정된 민감한 속성(예: 성별)의 사회적 맥락을 무시하는 지나치게 단순한 반사실 데이터를 생성하여 이러한 문제를 야기합니다. 이러한 한계를 해결하기 위해, 우리는 대규모 LM을 활용하여 편향 제거 코퍼스의 다양성과 문맥적 관련성을 향상시키는 간단하면서도 효과적인 문맥 증강 CDA 방법인 Context-CDA를 제안합니다. 증강된 문맥을 통해 편향 제거 코퍼스와 사전 훈련 데이터 간의 불일치를 최소화함으로써, 이 방법은 더 나은 정렬을 보장하여 언어 모델링 능력을 향상시킵니다. 또한, 대상 모델(즉, 편향을 제거할 LM)에서 생성된 반사실 데이터 중 품질이 낮은 것으로 판단되는 데이터를 불확실성 기반 필터링을 통해 제외하여, 미세 조정 코퍼스의 품질을 더욱 향상시킵니다. 성별 편향 벤치마크에 대한 실험 결과는 Context-CDA가 언어 모델링 성능을 저하시키지 않으면서 효과적으로 편향을 완화하며, 다음 토큰 생성 확률의 분포 변화를 분석하여 사회적 편향에 대한 통찰력을 제공한다는 것을 보여줍니다.

Original Abstract

A challenge in mitigating social bias in fine-tuned language models (LMs) is the potential reduction in language modeling capability, which can harm downstream performance. Counterfactual data augmentation (CDA), a widely used method for fine-tuning, highlights this issue by generating synthetic data that may align poorly with real-world distributions or creating overly simplistic counterfactuals that ignore the social context of altered sensitive attributes (e.g., gender) in the pretraining corpus. To address these limitations, we propose a simple yet effective context-augmented CDA method, Context-CDA, which uses large LMs to enhance the diversity and contextual relevance of the debiasing corpus. By minimizing discrepancies between the debiasing corpus and pretraining data through augmented context, this approach ensures better alignment, enhancing language modeling capability. We then employ uncertainty-based filtering to exclude generated counterfactuals considered low-quality by the target smaller LMs (i.e., LMs to be debiased), further improving the fine-tuning corpus quality. Experimental results on gender bias benchmarks demonstrate that Context-CDA effectively mitigates bias without sacrificing language modeling performance while offering insights into social biases by analyzing distribution shifts in next-token generation probabilities.

0 Citations

0 Influential

4.5 Altmetric

22.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!