2601.21572v1 Jan 29, 2026 cs.LG

신호 적응형 신뢰 영역을 이용한 순환 스파이킹 신경망의 경사 기반 최적화

Signal-Adaptive Trust Regions for Gradient-Free Optimization of Recurrent Spiking Neural Networks

Jinhao Li

Citations: 0

h-index: 0

Yuhao Sun

Citations: 823

h-index: 3

Zhiyuan Ma

Citations: 0

h-index: 0

Hao He

Citations: 4

h-index: 2

Xinche Zhang

Citations: 11

h-index: 2

Xing Chen

Citations: 110

h-index: 1

Jin Li

Citations: 109

h-index: 1

Sen Song

Citations: 439

h-index: 2

순환 스파이킹 신경망(RSNN)은 에너지 효율적인 제어 정책을 위한 유망한 기반 기술이지만, 고차원, 장기 강화 학습을 위한 RSNN의 학습은 여전히 어려운 과제입니다. 개체군 기반, 경사 기반 최적화 방법은 미분 불가능한 스파이크 동역학을 통해 역전파를 수행하지 않고, 대신 경사를 추정하여 문제를 해결합니다. 그러나 유한한 개체군 크기에서는 이러한 추정치의 높은 변동성이 해로운, 과도하게 공격적인 업데이트 단계를 유발할 수 있습니다. 강화 학습에서 정책 업데이트를 분포 공간 내에서 제한하는 신뢰 영역 방법을 참고하여, 우리는 신호 에너지에 의해 정규화된 KL 발산 값을 기준으로 상대적인 변화를 제한하는 분포 기반 업데이트 규칙인 **신호 적응형 신뢰 영역(SATR)**을 제안합니다. SATR은 강한 신호 하에서는 신뢰 영역을 자동으로 확장하고, 업데이트가 노이즈에 지배될 때는 축소합니다. 우리는 SATR을 베르누이 연결 분포에 적용했으며, 이는 RSNN 최적화에서 뛰어난 실증적 성능을 보여주었습니다. 다양한 고차원 연속 제어 벤치마크에서, SATR은 제한된 개체군 환경에서 안정성을 향상시키고, PPO-LSTM과 같은 강력한 기본 모델과 경쟁력 있는 성능을 달성했습니다. 또한, SATR을 실제 규모로 적용하기 위해, 이진 스파이킹 및 이진 가중치에 대한 비트셋 구현을 도입하여 실제 학습 시간을 크게 줄이고, RSNN 정책 검색 속도를 향상시켰습니다.

Original Abstract

Recurrent spiking neural networks (RSNNs) are a promising substrate for energy-efficient control policies, but training them for high-dimensional, long-horizon reinforcement learning remains challenging. Population-based, gradient-free optimization circumvents backpropagation through non-differentiable spike dynamics by estimating gradients. However, with finite populations, high variance of these estimates can induce harmful and overly aggressive update steps. Inspired by trust-region methods in reinforcement learning that constrain policy updates in distribution space, we propose \textbf{Signal-Adaptive Trust Regions (SATR)}, a distributional update rule that constrains relative change by bounding KL divergence normalized by an estimated signal energy. SATR automatically expands the trust region under strong signals and contracts it when updates are noise-dominated. We instantiate SATR for Bernoulli connectivity distributions, which have shown strong empirical performance for RSNN optimization. Across a suite of high-dimensional continuous-control benchmarks, SATR improves stability under limited populations and reaches competitive returns against strong baselines including PPO-LSTM. In addition, to make SATR practical at scale, we introduce a bitset implementation for binary spiking and binary weights, substantially reducing wall-clock training time and enabling fast RSNN policy search.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!