TBFF-DAC: Two-branch feature fusion based on deformable attention and convolution for object detection
Journal article, 2024
Designing an architecture based on feature fusion between global information and local information is an important topic for object detection. The current problem is the trade-off between high prediction precision and low computing cost. In general, the prediction precision is improved while increasing the computational cost; examples include the YOLO series. To solve this problem, we propose the two-branch feature fusion based on the deformable attention and convolution (TBFF-DAC) model. The two-branch framework aims to improve precision through the feature fusion head. Deformable attention can extract the globe for the feature fusion head and prune the neurons of the network to decrease the inference time. We also utilize the expanded efficient layer aggregation network (E-ELAN), path aggregation feature pyramid network (PAFPN), and spatial pyramid pooling cross-stage partial (SPPCSP) for the convolution branch. The above-stated modules can effectively learn the image features and improve the precision of the metric. In addition, we design a loss of deformable positions to improve the precision. Our experiments show that the proposed model is better than other feature fusion methods. Our model has a good performance compared with other DETR methods. Compared with the YOLO series, our model performs well on small objects and middle objects and has a lower computational cost.
Loss function
Convolution
Deformable attention
Object detection
Feature fusion