2603.01919v1 Mar 02, 2026 cs.CR

실제 비용, 가짜 모델: 쉐도우 API에서 발견되는 기만적인 모델 주장

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

Yukun Jiang

Citations: 182

h-index: 7

Yage Zhang

Citations: 39

h-index: 3

Michael Backes

Citations: 1,281

h-index: 18

Xinyue Shen

Citations: 1,620

h-index: 14

Zeyuan Chen

Citations: 910

h-index: 6

Yang Zhang

Citations: 59

h-index: 4

GPT-5 및 Gemini-2.5와 같은 최첨단 거대 언어 모델(LLM)에 대한 접근은 높은 가격, 결제 장벽 및 지역적 제한으로 인해 종종 어려움을 겪습니다. 이러한 제한 사항은 지역적 제한 없이 공식 모델 서비스에 대한 간접적인 접근을 제공한다고 주장하는 제3자 서비스인 $ extit{쉐도우 API}$의 확산을 초래합니다. 쉐도우 API는 널리 사용되지만, 쉐도우 API가 공식 API와 동일한 결과를 제공하는지 여부는 여전히 불분명하며, 이는 다운스트림 애플리케이션의 신뢰성과 이러한 API에 의존하는 연구 결과의 타당성에 대한 우려를 불러일으킵니다. 본 논문에서는 공식 LLM API와 해당 쉐도우 API 간의 최초의 체계적인 성능 검증을 수행합니다. 우리는 187개의 학술 논문에서 사용된 17개의 쉐도우 API를 식별했으며, 가장 인기 있는 API는 2025년 12월 6일까지 5,966회의 인용 횟수와 58,639개의 GitHub 스타를 기록했습니다. 우리는 유용성, 안전성 및 모델 검증 측면에서 세 가지 대표적인 쉐도우 API에 대한 다차원적인 성능 검증을 수행하여 쉐도우 API에서 발생하는 기만적인 행위에 대한 간접적 및 직접적인 증거를 발견했습니다. 구체적으로, 성능 차이가 최대 47.21%에 달하는 것을 확인했으며, 안전 관련 행동의 상당한 예측 불가능성이 존재하고, 핑거프린트 테스트의 45.83%에서 모델 식별 검증에 실패하는 것을 확인했습니다. 이러한 기만적인 행위는 과학 연구의 재현성과 타당성을 심각하게 훼손하고, 쉐도우 API 사용자의 이익을 해치며, 공식 모델 제공 업체의 명성을 손상시킵니다.

Original Abstract

Access to frontier large language models (LLMs), such as GPT-5 and Gemini-2.5, is often hindered by high pricing, payment barriers, and regional restrictions. These limitations drive the proliferation of $\textit{shadow APIs}$, third-party services that claim to provide access to official model services without regional limitations via indirect access. Despite their widespread use, it remains unclear whether shadow APIs deliver outputs consistent with those of the official APIs, raising concerns about the reliability of downstream applications and the validity of research findings that depend on them. In this paper, we present the first systematic audit between official LLM APIs and corresponding shadow APIs. We first identify 17 shadow APIs that have been utilized in 187 academic papers, with the most popular one reaching 5,966 citations and 58,639 GitHub stars by December 6, 2025. Through multidimensional auditing of three representative shadow APIs across utility, safety, and model verification, we uncover both indirect and direct evidence of deception practices in shadow APIs. Specifically, we reveal performance divergence reaching up to $47.21\%$, significant unpredictability in safety behaviors, and identity verification failures in $45.83\%$ of fingerprint tests. These deceptive practices critically undermine the reproducibility and validity of scientific research, harm the interests of shadow API users, and damage the reputation of official model providers.

7 Citations

1 Influential

9 Altmetric

54.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!