2602.11685v1 Feb 12, 2026 cs.LG

DRACO: 심층 연구의 정확성, 완전성, 객관성을 위한 교차 도메인 벤치마크

DRACO: a Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity

Hao Zhang

Citations: 5,048

h-index: 9

Clare Southern

Citations: 5

h-index: 1

Jeremy Yang

Citations: 12

h-index: 2

Denis Yarats

Citations: 7,771

h-index: 21

Johnny Ho

Citations: 11

h-index: 2

Jerry Ma

Citations: 33

h-index: 4

J. Zhong

Citations: 62

h-index: 5

Thomas Wang

Citations: 11,372

h-index: 11

K. Jung

Citations: 1,269

h-index: 14

Shu Zhang

Citations: 5

h-index: 1

우리는 복잡한 심층 연구 작업을 위한 벤치마크인 DRACO(심층 연구 정확성, 완전성 및 객관성)를 제시한다. 10개 도메인에 걸쳐 있으며 40개국의 정보 출처를 활용하는 이 작업들은 대규모 심층 연구 시스템 내의 익명화된 실제 사용 패턴에서 비롯되었다. 작업은 비식별화된 Perplexity Deep Research 요청 데이터셋에서 샘플링된 후, 작업이 익명화되고, 개방형이며 복잡하고, 객관적으로 평가 가능하며, 실제 심층 연구 사용 사례의 광범위한 범위를 대표할 수 있도록 필터링 및 증강된다. 결과물은 사실적 정확성(정확성), 분석의 폭과 깊이(완전성 포함), 프레젠테이션 품질(객관성 포함) 및 인용 품질이라는 4가지 차원에 따라 작업별 루브릭을 기준으로 평가된다. DRACO는 https://hf.co/datasets/perplexity-ai/draco에서 공개적으로 이용 가능하다.

Original Abstract

We present DRACO (Deep Research Accuracy, Completeness, and Objectivity), a benchmark of complex deep research tasks. These tasks, which span 10 domains and draw on information sources from 40 countries, originate from anonymized real-world usage patterns within a large-scale deep research system. Tasks are sampled from a de-identified dataset of Perplexity Deep Research requests, then filtered and augmented to ensure that the tasks are anonymized, open-ended and complex, objectively evaluable, and representative of the broad scope of real-world deep research use cases. Outputs are graded against task-specific rubrics along four dimensions: factual accuracy (accuracy), breadth and depth of analysis (including completeness), presentation quality (including objectivity), and citation quality. DRACO is publicly available at https://hf.co/datasets/perplexity-ai/draco.

5 Citations

1 Influential

10.5 Altmetric

59.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!