2604.01929v1 Apr 02, 2026 cs.SD

Woosh: 음향 효과 기반 모델

Woosh: A Sound Effects Foundation Model

Joan Serrà

Citations: 45

h-index: 4

Yuki Mitsufuji

Citations: 48

h-index: 4

Gaëtan Hadjeres

Citations: 1,565

h-index: 17

Marc Ferras

Citations: 479

h-index: 12

Khaled Koutini

Citations: 984

h-index: 16

Benno Weck

Universitat Pompeu Fabra

Citations: 213

h-index: 6

Alexandre Bittar

Citations: 20

h-index: 4

Thomas Hummel

Citations: 202

h-index: 7

Zineb Lahrici

Citations: 1

h-index: 1

H. Missoum

Citations: 35

h-index: 2

오디오 연구 커뮤니티는 새로운 접근 방식을 개발하고 기준을 설정하기 위한 기초 도구로서 오픈 소스 생성 모델에 의존합니다. 본 보고서에서는 Sony AI에서 공개한 음향 효과 기반 모델인 Woosh를 소개하며, 그 구조, 학습 과정, 그리고 다른 인기 있는 오픈 소스 모델과의 성능 비교 결과를 상세히 설명합니다. Woosh는 음향 효과에 최적화되어 설계되었으며, (1) 고품질 오디오 인코더/디코더 모델과 (2) 텍스트-오디오 정렬 모델을 제공하며, 더불어 (3) 텍스트-오디오 생성 모델과 (4) 비디오-오디오 생성 모델을 포함합니다. 또한, 리소스 사용량이 적고 빠른 추론이 가능한 증류된 텍스트-오디오 및 비디오-오디오 모델도 함께 제공됩니다. 공개 데이터 및 비공개 데이터를 활용한 평가 결과, 각 모듈은 StableAudio-Open 및 TangoFlux와 같은 기존 오픈 소스 모델에 비해 경쟁력 있는 또는 더 나은 성능을 보였습니다. 추론 코드와 모델 가중치는 https://github.com/SonyResearch/Woosh 에서 확인할 수 있으며, 데모 샘플은 https://sonyresearch.github.io/Woosh/ 에서 제공됩니다.

Original Abstract

The audio research community depends on open generative models as foundational tools for building novel approaches and establishing baselines. In this report, we present Woosh, Sony AI's publicly released sound effect foundation model, detailing its architecture, training process, and an evaluation against other popular open models. Being optimized for sound effects, we provide (1) a high-quality audio encoder/decoder model and (2) a text-audio alignment model for conditioning, together with (3) text-to-audio and (4) video-to-audio generative models. Distilled text-to-audio and video-to-audio models are also included in the release, allowing for low-resource operation and fast inference. Our evaluation on both public and private data shows competitive or better performance for each module when compared to existing open alternatives like StableAudio-Open and TangoFlux. Inference code and model weights are available at https://github.com/SonyResearch/Woosh. Demo samples can be found at https://sonyresearch.github.io/Woosh/.

1 Citations

1 Influential

28.5 Altmetric

145.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!