2107.02137 Jul 05, 2021 cs.AI

ERNIE 3.0: 언어 이해 및 생성을 위한 대규모 지식 강화 사전 학습

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Yu Sun

Citations: 5,843

h-index: 22

Shuohuan Wang

Citations: 3,343

h-index: 16

Shikun Feng

Citations: 3,750

h-index: 8

Siyu Ding

Citations: 807

h-index: 8

Chao Pang

Citations: 995

h-index: 8

Junyuan Shang

Citations: 2,314

h-index: 14

Jiaxiang Liu

Citations: 1,292

h-index: 13

Xuyi Chen

Citations: 1,949

h-index: 8

Yanbin Zhao

Citations: 681

h-index: 2

Zhihua Wu

Citations: 1,565

h-index: 16

Weibao Gong

Citations: 772

h-index: 5

Jianzhong Liang

Citations: 679

h-index: 3

Zhizhou Shang

Citations: 593

h-index: 1

Peng Sun

Citations: 1,654

h-index: 14

Yujia Liu

Citations: 0

h-index: 0

Ouyang Xuan

Citations: 965

h-index: 8

Dianhai Yu

Citations: 4,377

h-index: 28

Hao Tian

Citations: 5,159

h-index: 23

Hua Wu

Citations: 15,575

h-index: 59

Haifeng Wang

Citations: 12,553

h-index: 53

Yuxiang Lu

Citations: 955

h-index: 7

Weixin Liu

Citations: 706

h-index: 4

사전 학습된 모델들은 다양한 자연어 처리(NLP) 작업에서 최첨단 성과를 달성했습니다. T5 및 GPT-3와 같은 최근 연구들은 사전 학습된 언어 모델의 규모를 확장하는 것이 일반화 능력을 향상시킬 수 있음을 보여주었습니다. 특히, 1,750억 개의 파라미터를 가진 GPT-3 모델은 강력한 태스크 불가지론적(task-agnostic) 제로샷(zero-shot) 및 퓨샷(few-shot) 학습 능력을 보여줍니다. 이러한 성공에도 불구하고, 이들 대규모 모델들은 언어적 지식이나 세계 지식과 같은 지식의 도입 없이 평문(plain texts)으로만 훈련되었습니다. 게다가 대부분의 대규모 모델들은 자기 회귀(auto-regressive) 방식으로 훈련됩니다. 그 결과, 이러한 전통적인 미세 조정 방식은 다운스트림 언어 이해 작업을 해결할 때 상대적으로 약한 성능을 보입니다. 이러한 문제들을 해결하기 위해, 우리는 대규모 지식 강화 모델 사전 학습을 위한 통합 프레임워크인 ERNIE 3.0을 제안합니다. 이는 자기 회귀 네트워크와 오토인코딩 네트워크를 융합하여, 훈련된 모델이 제로샷 학습, 퓨샷 학습 또는 미세 조정을 통해 자연어 이해 및 생성 작업 모두에 쉽게 최적화될 수 있도록 합니다. 우리는 평문과 대규모 지식 그래프로 구성된 4TB 코퍼스에서 100억 개의 파라미터를 가진 모델을 훈련시켰습니다. 실증적 결과에 따르면, 이 모델은 54개의 중국어 NLP 작업에서 최첨단 모델들을 능가했으며, 영어 버전은 SuperGLUE 벤치마크(2021년 7월 3일 기준)에서 1위를 차지하여 인간의 성능을 +0.8% 앞섰습니다(90.6% 대 89.8%).

Original Abstract

Pre-trained models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up pre-trained language models can improve their generalization abilities. Particularly, the GPT-3 model with 175 billion parameters shows its strong task-agnostic zero-shot/few-shot learning capabilities. Despite their success, these large-scale models are trained on plain texts without introducing knowledge such as linguistic knowledge and world knowledge. In addition, most large-scale models are trained in an auto-regressive way. As a result, this kind of traditional fine-tuning approach demonstrates relatively weak performance when solving downstream language understanding tasks. In order to solve the above problems, we propose a unified framework named ERNIE 3.0 for pre-training large-scale knowledge enhanced models. It fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few-shot learning or fine-tuning. We trained the model with 10 billion parameters on a 4TB corpus consisting of plain texts and a large-scale knowledge graph. Empirical results show that the model outperforms the state-of-the-art models on 54 Chinese NLP tasks, and its English version achieves the first place on the SuperGLUE benchmark (July 3, 2021), surpassing the human performance by +0.8% (90.6% vs. 89.8%).

593 Citations

135 Influential

29.5 Altmetric

1,010.5 Score

Original PDF

AI Analysis

Korean Summary

바이두(Baidu) 연구진이 제안한 ERNIE 3.0은 100억 개의 파라미터를 가진 대규모 지식 강화 사전 학습 모델입니다. 기존 대규모 모델들이 평문 텍스트에만 의존하고 자연어 이해(NLU) 혹은 생성(NLG) 중 특정 방식에 편중되는 한계를 극복하기 위해, 4TB 규모의 텍스트와 대규모 지식 그래프를 결합했습니다. ERNIE 3.0은 자기 회귀(Auto-regressive)와 오토 인코딩(Auto-encoding) 네트워크를 융합한 통합 프레임워크를 통해 NLU와 NLG 태스크 모두에서 뛰어난 성능을 발휘하며, 54개의 중국어 NLP 태스크와 영문 SuperGLUE 벤치마크에서 SOTA(State-of-the-Art)를 달성했습니다.

Key Innovations

지속적인 다중 패러다임 통합 사전 학습 프레임워크(Continual Multi-Paradigms Unified Pre-training Framework)
범용 표현 모듈(Universal Representation Module)과 태스크별 표현 모듈(Task-specific Representation Modules)을 분리한 아키텍처
지식 그래프와 비정형 텍스트를 연결하여 학습하는 보편적 지식-텍스트 예측(Universal Knowledge-Text Prediction, UKTP) 태스크
자연어 이해(NLU)를 위한 오토 인코딩과 자연어 생성(NLG)을 위한 자기 회귀 네트워크의 결합

Learning & Inference Impact

학습 측면에서 ERNIE 3.0은 하위 레이어를 공유하여 공통된 구문/어휘적 특징을 학습하고, 상위 레이어를 NLU와 NLG로 분리함으로써 각 태스크의 상충되는 요구사항(이해의 일관성 vs 생성의 문맥적 정보)을 해결하고 학습 효율을 높였습니다. 또한 지식 그래프의 트리플(Triple) 정보를 학습에 통합함으로써 모델의 추론 능력과 사실 기억 능력을 강화했습니다. 추론 측면에서는 하나의 모델로 제로샷, 퓨샷, 파인튜닝 등 다양한 시나리오에 유연하게 대응할 수 있으며, 지식 강화 덕분에 질의응답이나 문장 생성 시 환각 현상을 줄이고 논리적인 답변을 생성하는 데 긍정적인 영향을 미칩니다.

Technical Difficulty

고급

Estimated implementation complexity based on methodology.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!