2603.12743v1 Mar 13, 2026 cs.CV

MoKus: 다중 모달 지식 전달을 활용한 지식 기반 개념 맞춤 설정

MoKus: Leveraging Cross-Modal Knowledge Transfer for Knowledge-Aware Concept Customization

Chenyang Zhu

Citations: 28

h-index: 3

Hongxiang Li

Citations: 82

h-index: 3

Xiujun Li

Citations: 0

h-index: 0

Long Chen

Citations: 43

h-index: 2

개념 맞춤 설정은 일반적으로 희귀 토큰을 대상 개념에 연결하는 방식으로 이루어집니다. 그러나 이러한 접근 방식은 사전 훈련 데이터에 이러한 희귀 토큰이 거의 포함되지 않기 때문에 종종 불안정한 성능을 보입니다. 또한, 이러한 희귀 토큰은 대상 개념의 고유한 지식을 전달하지 못합니다. 따라서, 본 연구에서는 대상 시각적 개념에 다양한 텍스트 지식을 연결하는 것을 목표로 하는 새로운 작업인 지식 기반 개념 맞춤 설정을 제안합니다. 이 작업은 모델이 고품질의 맞춤형 생성을 수행하기 위해 텍스트 프롬프트 내의 지식을 식별하도록 요구합니다. 동시에 모델은 모든 텍스트 지식을 대상 개념에 효율적으로 연결해야 합니다. 따라서, 본 연구에서는 지식 기반 개념 맞춤 설정을 위한 새로운 프레임워크인 MoKus를 제안합니다. MoKus 프레임워크는 중요한 관찰에 기반합니다. 즉, 텍스트 모달 내의 지식 변경이 생성 과정에서 시각 모달로 자연스럽게 전달되는 다중 모달 지식 전달입니다. 이러한 관찰에 영감을 받아 MoKus는 두 단계로 구성됩니다. (1) 시각적 개념 학습 단계에서는 대상 개념의 시각적 정보를 저장하는 앵커 표현을 학습합니다. (2) 텍스트 지식 업데이트 단계에서는 앵커 표현에 대한 지식 쿼리의 답변을 업데이트하여 고품질의 맞춤형 생성을 가능하게 합니다. 제안하는 MoKus 프레임워크를 새로운 작업에 대해 종합적으로 평가하기 위해, 지식 기반 개념 맞춤 설정을 위한 최초의 벤치마크인 KnowCusBench를 소개합니다. 광범위한 실험 결과는 MoKus가 최첨단 방법보다 우수한 성능을 보인다는 것을 입증했습니다. 또한, 다중 모달 지식 전달을 통해 MoKus는 가상 개념 생성 및 개념 제거와 같은 다른 지식 기반 응용 분야로 쉽게 확장될 수 있습니다. 또한, 본 연구는 제안하는 방법이 세계 지식 벤치마크에서 성능 향상을 달성할 수 있음을 보여줍니다.

Original Abstract

Concept customization typically binds rare tokens to a target concept. Unfortunately, these approaches often suffer from unstable performance as the pretraining data seldom contains these rare tokens. Meanwhile, these rare tokens fail to convey the inherent knowledge of the target concept. Consequently, we introduce Knowledge-aware Concept Customization, a novel task aiming at binding diverse textual knowledge to target visual concepts. This task requires the model to identify the knowledge within the text prompt to perform high-fidelity customized generation. Meanwhile, the model should efficiently bind all the textual knowledge to the target concept. Therefore, we propose MoKus, a novel framework for knowledge-aware concept customization. Our framework relies on a key observation: cross-modal knowledge transfer, where modifying knowledge within the text modality naturally transfers to the visual modality during generation. Inspired by this observation, MoKus contains two stages: (1) In visual concept learning, we first learn the anchor representation to store the visual information of the target concept. (2) In textual knowledge updating, we update the answer for the knowledge queries to the anchor representation, enabling high-fidelity customized generation. To further comprehensively evaluate our proposed MoKus on the new task, we introduce the first benchmark for knowledge-aware concept customization: KnowCusBench. Extensive evaluations have demonstrated that MoKus outperforms state-of-the-art methods. Moreover, the cross-model knowledge transfer allows MoKus to be easily extended to other knowledge-aware applications like virtual concept creation and concept erasure. We also demonstrate the capability of our method to achieve improvements on world knowledge benchmarks.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!