FsaNet: Frequency Self-attention for Semantic Segmentation
Journal article, 2023

Considering the spectral properties of images, we propose a new self-attention mechanism with highly reduced computational complexity, up to a linear rate. To better preserve edges while promoting similarity within objects, we propose individualized processes over different frequency bands. In particular, we study a case where the process is merely over low-frequency components. By ablation study, we show that low frequency self-attention can achieve very close or better performance relative to full frequency even without retraining the network. Accordingly, we design and embed novel plug-and-play modules to the head of a CNN network that we refer to as FsaNet. The frequency self-attention 1) requires only a few low frequency coefficients as input, 2) can be mathematically equivalent to spatial domain self-attention with linear structures, 3) simplifies token mapping (1×1 convolution) stage and token mixing stage simultaneously. We show that frequency self-attention requires 87.29% ~ 90.04% less memory, 96.13% ~ 98.07% less FLOPs, and 97.56% ~ 98.18% in run time than the regular self-attention. Compared to other ResNet101-based self-attention networks, FsaNet achieves a new state-of-the-art result (83.0% mIoU) on Cityscape test dataset and competitive results on ADE20k and VOCaug. FsaNet can also enhance MASK R-CNN for instance segmentation on COCO. In addition, utilizing the proposed module, Segformer can be boosted on a series of models with different scales, and Segformer-B5 can be improved even without retraining. Code is accessible at https://github.com/zfy-csu/FsaNet.

linear complexity

semantic segmentation

low frequency

Self-attention

frequency decoupling

Author

Fengyu Zhang

Central South University

Ashkan Panahi

Chalmers, Computer Science and Engineering (Chalmers), Data Science and AI

Guangjun Gao

Central South University

IEEE Transactions on Image Processing

1057-7149 (ISSN) 19410042 (eISSN)

Vol. 32 4757-4772

Subject Categories

Language Technology (Computational Linguistics)

Signal Processing

DOI

10.1109/TIP.2023.3305090

PubMed

37594865

More information

Latest update

3/7/2024 9