2603.00576v1 Feb 28, 2026 cs.SD

기호 음악 생성에 대한 효율적인 장기 시퀀스 확산 모델링

Efficient Long-Sequence Diffusion Modeling for Symbolic Music Generation

Jiatao Chen

Citations: 384

h-index: 10

Xing Tang

Citations: 64

h-index: 4

Haoran Zhang

Citations: 2,850

h-index: 8

Shenghua Yuan

Citations: 13

h-index: 2

Tianming Xi

Citations: 0

h-index: 0

Jing Wang

Citations: 4

h-index: 2

Jinhang Xu

Citations: 61

h-index: 2

Guangli Xiang

Citations: 119

h-index: 4

Jiaojiao Yu

Citations: 28

h-index: 3

Houpeng Yang

Citations: 0

h-index: 0

기호 음악 생성은 멀티미디어 생성 분야에서 어려운 과제로, 이는 계층적 시간 구조, 장거리 의존성 및 미세한 지역적 세부 사항을 포함하는 긴 시퀀스를 다룹니다. 최근의 확산 기반 모델들은 고품질의 결과물을 생성하지만, 반복적인 노이즈 제거 과정과 시퀀스 길이 관련 비용으로 인해 긴 기호 시퀀스를 처리할 때 높은 학습 및 추론 비용이 발생하는 경향이 있습니다. 이러한 문제를 해결하기 위해, 우리는 효율적인 전역 구조 구축과 경량화된 지역 세부 조정 방법을 결합한 SMDIM이라는 확산 전략을 제안합니다. SMDIM은 구조화된 상태 공간 모델을 사용하여 거의 선형적인 비용으로 장거리 음악적 맥락을 포착하고, 하이브리드 정제 방식을 통해 지역적인 음악적 세부 사항을 선택적으로 개선합니다. 다양한 서양 클래식 음악, 대중음악 및 전통 민속 음악 데이터 세트에 대한 실험 결과, SMDIM 모델이 생성 품질과 계산 효율성 측면에서 다른 최첨단 접근 방식보다 우수한 성능을 보이며, 상대적으로 덜 연구된 음악 스타일에도 강력한 일반화 능력을 갖는 것으로 나타났습니다. 이러한 결과는 SMDIM이 시퀀스에 연결된 속성을 포함하는 장기 시퀀스 기호 음악 생성에 대한 체계적인 솔루션을 제공한다는 것을 보여줍니다. 프로젝트 웹 페이지(https://3328702107.github.io/smdim-music/)에서 오디오 예제 및 추가 자료를 확인할 수 있습니다.

Original Abstract

Symbolic music generation is a challenging task in multimedia generation, involving long sequences with hierarchical temporal structures, long-range dependencies, and fine-grained local details. Though recent diffusion-based models produce high quality generations, they tend to suffer from high training and inference costs with long symbolic sequences due to iterative denoising and sequence-length-related costs. To deal with such problem, we put forth a diffusing strategy named SMDIM to combine efficient global structure construction and light local refinement. SMDIM uses structured state space models to capture long range musical context at near linear cost, and selectively refines local musical details via a hybrid refinement scheme. Experiments performed on a wide range of symbolic music datasets which encompass various Western classical music, popular music and traditional folk music show that the SMDIM model outperforms the other state-of-the-art approaches on both the generation quality and the computational efficiency, and it has robust generalization to underexplored musical styles. These results show that SMDIM offers a principled solution for long-sequence symbolic music generation, including associated attributes that accompany the sequences. We provide a project webpage with audio examples and supplementary materials at https://3328702107.github.io/smdim-music/.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!