S

Steven Euijong Whang

Famous Author
Korea Advanced Institute of Science and Technology
Total Citations
5,291
h-index
28
Papers
4

Publications

#1 2602.04306v1 Feb 04, 2026

DeFrame: Debiasing Large Language Models Against Framing Effects

As large language models (LLMs) are increasingly deployed in real-world applications, ensuring their fair responses across demographics has become crucial. Despite many efforts, an ongoing challenge is hidden bias: LLMs appear fair under standard evaluations, but can produce biased responses outside those evaluation settings. In this paper, we identify framing -- differences in how semantically equivalent prompts are expressed (e.g., "A is better than B" vs. "B is worse than A") -- as an underexplored contributor to this gap. We first introduce the concept of "framing disparity" to quantify the impact of framing on fairness evaluation. By augmenting fairness evaluation benchmarks with alternative framings, we find that (1) fairness scores vary significantly with framing and (2) existing debiasing methods improve overall (i.e., frame-averaged) fairness, but often fail to reduce framing-induced disparities. To address this, we propose a framing-aware debiasing method that encourages LLMs to be more consistent across framings. Experiments demonstrate that our approach reduces overall bias and improves robustness against framing disparities, enabling LLMs to produce fairer and more consistent responses.

Kahee Lim Soyeon Kim Steven Euijong Whang
0 Citations
#2 2601.22888v1 Jan 30, 2026

Should LLMs, $\textit{like}$, Generate How Users Talk? Building Dialect-Accurate Dialog[ue]s Beyond the American Default with MDial

More than 80% of the 1.6 billion English speakers do not use Standard American English (SAE) and experience higher failure rates and stereotyped responses when interacting with LLMs as a result. Yet multi-dialectal performance remains underexplored. We introduce $\textbf{MDial}$, the first large-scale framework for generating multi-dialectal conversational data encompassing the three pillars of written dialect -- lexical (vocabulary), orthographic (spelling), and morphosyntactic (grammar) features -- for nine English dialects. Partnering with native linguists, we design an annotated and scalable rule-based LLM transformation to ensure precision. Our approach challenges the assumption that models should mirror users' morphosyntactic features, showing that up to 90% of the grammatical features of a dialect should not be reproduced by models. Independent evaluations confirm data quality, with annotators preferring MDial outputs over prior methods in 98% of pairwise comparisons for dialect naturalness. Using this pipeline, we construct the dialect-parallel $\textbf{MDialBench}$mark with 50k+ dialogs, resulting in 97k+ QA pairs, and evaluate 17 LLMs on dialect identification and response generation tasks. Even frontier models achieve under 70% accuracy, fail to reach 50% for Canadian English, and systematically misclassify non-SAE dialects as American or British. As dialect identification underpins natural language understanding, these errors risk cascading failures into downstream tasks.

Steven Euijong Whang Jio Oh Dezhi Hong Paul Vicinanza Thomas Butler +1
0 Citations
#3 2601.12654v2 Jan 19, 2026

Explanation Multiplicity in SHAP: Characterization and Assessment

Post-hoc explanations are widely used to justify, contest, and review automated decisions in high-stakes domains such as lending, employment, and healthcare. Among these methods, SHAP is often treated as providing a reliable account of which features mattered for an individual prediction and is routinely used to support recourse, oversight, and accountability. In practice, however, SHAP explanations can differ substantially across repeated runs, even when the individual, prediction task, and trained model are held fixed. We conceptualize and name this phenomenon explanation multiplicity: the existence of multiple, internally valid but substantively different explanations for the same decision. Explanation multiplicity poses a normative challenge for responsible AI deployment, as it undermines expectations that explanations can reliably identify the reasons for an adverse outcome. We present a comprehensive methodology for characterizing explanation multiplicity in post-hoc feature attribution methods, disentangling sources arising from model training and selection versus stochasticity intrinsic to the explanation pipeline. Furthermore, whether explanation multiplicity is surfaced depends on how explanation consistency is measured. Commonly used magnitude-based metrics can suggest stability while masking substantial instability in the identity and ordering of top-ranked features. To contextualize observed instability, we derive and estimate randomized baseline values under plausible null models, providing a principled reference point for interpreting explanation disagreement. Across datasets, model classes, and confidence regimes, we find that explanation multiplicity is widespread and persists even under highly controlled conditions, including high-confidence predictions. Thus explanation practices must be evaluated using metrics and baselines aligned with their intended societal role.

Steven Euijong Whang Hyunseung Hwang Seungeun Lee Julia Stoyanovich Lucas Rosenblatt
1 Citations
#4 2601.06225v1 Jan 09, 2026

Classroom AI: Large Language Models as Grade-Specific Teachers

Large Language Models (LLMs) offer a promising solution to complement traditional teaching and address global teacher shortages that affect hundreds of millions of children, but they fail to provide grade-appropriate responses for students at different educational levels. We introduce a framework for finetuning LLMs to generate age-appropriate educational content across six grade levels, from lower elementary to adult education. Our framework successfully adapts explanations to match students' comprehension capacities without sacrificing factual correctness. This approach integrates seven established readability metrics through a clustering method and builds a comprehensive dataset for grade-specific content generation. Evaluations across multiple datasets with 208 human participants demonstrate substantial improvements in grade-level alignment, achieving a 35.64 percentage point increase compared to prompt-based methods while maintaining response accuracy. AI-assisted learning tailored to different grade levels has the potential to advance educational engagement and equity.

James Evans Steven Euijong Whang Jindong Wang Jio Oh
0 Citations