2601.12147v1 Jan 17, 2026 cs.CV

통합 모델로 모든 객체를 분할하고 마팅: Segment and Matte Anything

Segment and Matte Anything in a Unified Model

Kaushiki Nag

Citations: 208

h-index: 8

Kannan Achan

Citations: 346

h-index: 10

Zezhong Fan

Citations: 107

h-index: 3

Xiaohan Li

Citations: 117

h-index: 4

Topojoy Biswas

Citations: 79

h-index: 4

Segment Anything (SAM)은 10억 개 이상의 마스크 데이터로 학습한 후, 제로샷 일반화 및 유연한 프롬프팅 기능을 통해 분할 기술의 한계를 뛰어넘었습니다. 그러나 SAM의 마스크 예측 정확도는 실제 응용 분야에서 요구되는 정밀도에 종종 미치지 못합니다. SAM의 분할 품질을 향상시키기 위한 다양한 개선 모듈이 제안되었지만, 단일하고 통합된 프레임워크 내에서 매우 정확한 객체 윤곽을 얻는 것은 여전히 해결해야 할 과제입니다. 또한, 다양한 사용자 힌트에 따라 세밀한 알파 마스크를 생성하는 것을 목표로 하는 인터랙티브 이미지 마팅은 아직 SAM의 맥락에서 연구되지 않았습니다. 최근 연구에서 밝혀진 분할과 마팅 간의 강한 상관 관계는 두 가지 작업을 모두 수행할 수 있는 통합 모델의 가능성을 시사합니다. 본 논문에서는 SAM의 경량 확장 버전인 Segment And Matte Anything (SAMA)을 소개합니다. SAMA는 최소한의 추가 파라미터로 고품질의 인터랙티브 이미지 분할 및 마팅 기능을 제공합니다. 우리의 Multi-View Localization Encoder (MVLE)는 로컬 뷰에서 상세한 특징을 추출하고, Localization Adapter (Local-Adapter)는 미묘한 경계 디테일을 복원하여 마스크 출력을 개선합니다. 또한, 분할 및 마팅 마스크를 동시에 생성하기 위해 각 작업에 대한 두 개의 예측 헤드를 아키텍처에 통합했습니다. 공개적으로 사용 가능한 소스에서 수집된 다양한 데이터 세트로 학습된 SAMA는 여러 분할 및 마팅 벤치마크에서 최첨단 성능을 달성하며, 다양한 다운스트림 작업에서 그 적응성과 효과성을 입증합니다.

Original Abstract

Segment Anything (SAM) has recently pushed the boundaries of segmentation by demonstrating zero-shot generalization and flexible prompting after training on over one billion masks. Despite this, its mask prediction accuracy often falls short of the precision required in real-world applications. While several refinement modules have been proposed to boost SAM's segmentation quality, achieving highly accurate object delineation within a single, unified framework remains an open challenge. Furthermore, interactive image matting, which aims to generate fine-grained alpha mattes guided by diverse user hints, has not yet been explored in the context of SAM. Insights from recent studies highlight strong correlations between segmentation and matting, suggesting the feasibility of a unified model capable of both tasks. In this paper, we introduce Segment And Matte Anything (SAMA), a lightweight extension of SAM that delivers high-quality interactive image segmentation and matting with minimal extra parameters. Our Multi-View Localization Encoder (MVLE) captures detailed features from local views, while the Localization Adapter (Local-Adapter) refines mask outputs by recovering subtle boundary details. We also incorporate two prediction heads for each task into the architecture to generate segmentation and matting masks, simultaneously. Trained on a diverse dataset aggregated from publicly available sources, SAMA achieves state-of-the-art performance across multiple segmentation and matting benchmarks, showcasing its adaptability and effectiveness in a wide range of downstream tasks.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!