2603.19565v1 Mar 20, 2026 cs.CV

PFM-VEPAR: RGB-이벤트 카메라 기반 보행자 속성 인식에 기초 모델 프롬프팅을 활용

PFM-VEPAR: Prompting Foundation Models for RGB-Event Camera based Pedestrian Attribute Recognition

Minghe Xu

Citations: 2

h-index: 1

Rouying Wu

Citations: 2

h-index: 1

Chiawei Chu

Citations: 6

h-index: 1

Xiao Wang

Citations: 1

h-index: 1

Yu Li

Citations: 0

h-index: 0

이벤트 기반 보행자 속성 인식(PAR)은 모션 정보를 활용하여 조명 부족 및 모션 블러 환경에서 RGB 카메라의 성능을 향상시키고, 나이 및 감정과 같은 속성에 대한 더욱 정확한 추론을 가능하게 합니다. 그러나 기존의 2스트림 멀티모달 융합 방법은 상당한 계산 부담을 초래하며, 문맥 샘플로부터 얻을 수 있는 귀중한 정보를 간과합니다. 이러한 한계를 극복하기 위해 본 논문에서는 이벤트 프롬프터를 제안합니다. 계산 비용이 많이 드는 보조 백본을 제거하고, 이 모듈은 극도로 가볍고 효율적인 이산 코사인 변환(DCT) 및 역 이산 코사인 변환(IDCT) 연산을 이벤트 데이터에 직접 적용합니다. 이러한 설계는 최소한의 계산 비용으로 주파수 영역 이벤트 특징을 추출하여 RGB 분기를 효과적으로 보완합니다. 또한, 풍부한 사전 지식을 제공하도록 설계된 외부 메모리 뱅크와 최신 호필드 네트워크를 결합하여 연관 기억 기반의 표현 학습을 가능하게 합니다. 이 메커니즘은 다양한 샘플 간의 전역 관계 지식을 효과적으로 활용합니다. 마지막으로, 크로스 어텐션 메커니즘을 사용하여 RGB 및 이벤트 모달리티를 융합하고, 속성 예측을 위한 피드포워드 네트워크를 사용합니다. 여러 벤치마크 데이터 세트에 대한 광범위한 실험을 통해 제안된 RGB-이벤트 PAR 프레임워크의 효과를 충분히 검증했습니다. 본 논문의 소스 코드는 https://github.com/Event-AHU/OpenPAR 에서 공개될 예정입니다.

Original Abstract

Event-based pedestrian attribute recognition (PAR) leverages motion cues to enhance RGB cameras in low-light and motion-blur scenarios, enabling more accurate inference of attributes like age and emotion. However, existing two-stream multimodal fusion methods introduce significant computational overhead and neglect the valuable guidance from contextual samples. To address these limitations, this paper proposes an Event Prompter. Discarding the computationally expensive auxiliary backbone, this module directly applies extremely lightweight and efficient Discrete Cosine Transform (DCT) and Inverse DCT (IDCT) operations to the event data. This design extracts frequency-domain event features at a minimal computational cost, thereby effectively augmenting the RGB branch. Furthermore, an external memory bank designed to provide rich prior knowledge, combined with modern Hopfield networks, enables associative memory-augmented representation learning. This mechanism effectively mines and leverages global relational knowledge across different samples. Finally, a cross-attention mechanism fuses the RGB and event modalities, followed by feed-forward networks for attribute prediction. Extensive experiments on multiple benchmark datasets fully validate the effectiveness of the proposed RGB-Event PAR framework. The source code of this paper will be released on https://github.com/Event-AHU/OpenPAR

0 Citations

0 Influential

46.601779125392 Altmetric

233.0 Score

Original PDF

184

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!