ARTA: Adaptive Mixed-Resolution Token Allocation for Efficient Semantic Segmentation

David Hagerman; Roman Naeem; Erik Brorsson; Fredrik Kahl; Lennart Svensson

doi:10.48550/arXiv.2603.26258

ARTA: Adaptive Mixed-Resolution Token Allocation for Efficient Semantic Segmentation
Preprint, 2026

We present ARTA, a mixed-resolution coarse-to-fine vision transformer for efficient dense feature extraction. Unlike models that begin with dense high-resolution (fine) tokens, ARTA starts with low-resolution (coarse) tokens and uses a lightweight allocator to predict which regions require more fine tokens. The allocator iteratively predicts a semantic (class) boundary score and allocates additional tokens to patches above a low threshold, concentrating token density near boundaries while maintaining high sensitivity to weak boundary evidence. This targeted allocation encourages tokens to represent a single semantic class rather than a mixture of classes. Mixed-resolution attention enables interaction between coarse and fine tokens, focusing computation on semantically complex areas while avoiding redundant processing in homogeneous regions. Experiments demonstrate that ARTA achieves state-of-the-art results on ADE20K and COCO-Stuff with substantially fewer FLOPs, and delivers competitive performance on Cityscapes at markedly lower compute. For example, ARTA-Base attains 54.6 mIoU on ADE20K in the ~100M-parameter class while using fewer FLOPs and less memory than comparable backbones.

Computer Vision

Semantic Segmentation

Efficient Machine Learning

Författare

David Hagerman

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

Roman Naeem

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

Erik Brorsson

Chalmers, Elektroteknik, System- och reglerteknik

Forskning Andra publikationer

Fredrik Kahl

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

Lennart Svensson

Chalmers, Elektroteknik, Signalbehandling och medicinsk teknik

Forskning Andra publikationer

Semiövervakad inlärning för medicinsk bildanalys

MedTech West, -- .

Visa projekt

Ämneskategorier (SSIF 2025)

Datorseende och lärande system

DOI

10.48550/arXiv.2603.26258

Publikationsdata kopplat till DOI

Relaterade dataset

ADE20K [dataset]

Mer information

Senast uppdaterat

2026-06-26

ARTA: Adaptive Mixed-Resolution Token Allocation for Efficient Semantic Segmentation Preprint, 2026

Författare

David Hagerman

Roman Naeem

Erik Brorsson

Fredrik Kahl

Lennart Svensson

Semiövervakad inlärning för medicinsk bildanalys

Ämneskategorier (SSIF 2025)

DOI

Relaterade dataset

ADE20K [dataset]

Mer information

Senast uppdaterat

ARTA: Adaptive Mixed-Resolution Token Allocation for Efficient Semantic Segmentation
Preprint, 2026