论文网址:[2502.20087] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

论文代码:GitHub - LMMMEng/OverLoCK: [CVPR 2025 Oral] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Methodology

2.4.1. Deep-stage Decomposition

2.4.2. Dynamic Convolution with Context-Mixing

2.4.3. Network Architecture

2.5. Experiments

2.5.1. Image Classification

2.5.2. Object Detection and Instance Segmentation

2.5.3. Semantic Segmentation

2.5.4. Ablation Studies

2.6. Conclusion

1. 心得

(1)接Oral接接接

(2)很标准的ConvNet文章写法,可以直接套模型来跑

2. 论文逐段精读

2.1. Abstract

        ①Challenge: feature pyramid (downsampling) did not achieve top-down attention mechanism

2.2. Introduction

        ①Key property of top-down attention mechanism: guidience of feedback signal

        ②Effective Receptive Fields (ERF) at stage 3 and 4

other models fail to localize object in stage 3 due to classification (loss) dependence

        ③Performance chart of OverLoCK and other compared models:

biomimetic  adj.仿生的;仿生化(技术)的

2.3. Related Work

        ①Mentioned classic conv nets, dynamic convs, and biomimetic models

2.4. Methodology

2.4.1. Deep-stage Decomposition

        ①The overview of OverLoCK:

where red lines are only applied in pre-training stage

        ②Structures of each block:

where feature map \mathbf{Z}_{i}\in\mathbb{R}^{C_{z}\times H\times W}, context prior \mathbf{P}_{i}\in\mathbb{R}^{C_{p}\times H\times W}\mathbf{Z}_{i+1}\in\mathbb{R}^{C_z\times H\times W}\mathbf{P^{\prime}}_i\in\mathbb{R}^{C_p\times H\times W}. Initial context prior \mathbf{P}_{o} is added for preventing context prior dilution \mathbf{P}_{i+1}=\alpha\mathbf{P}_{i}^{\prime}+\beta\mathbf{P}_{o}\alpha and \beta are learnable scalars

2.4.2. Dynamic Convolution with Context-Mixing

        ①The pipeline of ContMix:

where \mathbf{Q} \in \mathbb{R}^{C\times HW} = \mathrm{Re}(\mathbf{W}_q\mathbf{X})\textbf{K} \in \mathbb{R}^{C\times S^2} =\mathrm{Re}(\mathbf{W}_{k}\mathrm{Pool}(\mathbf{X}))\mathrm{Re} denotes reshape operator

        ②Evenly divide the channels of \mathbf{Q} and \textbf{K} into G groups, obtaining \{\mathbf{Q^{g}}\}_{g=1}^{G} and \{\mathbf{K^{g}}\}_{g=1}^{G}, where \mathrm{Q}^{\mathbf{g}}\in\mathbb{R}^{\frac{C}{G}\times HW} and \mathbf{K^{g}}\in\mathbb{R}^{\frac{C}{G}\times S^{2}}. Calculating affinity matrix by:

\{\mathbf{A^{g}}\}_{g=1}^{G}=\{\mathbf{Q^{gT}}\mathbf{K^{g}}\}_{g=1}^{G}

where \mathbf{A^{g}}\in\mathbb{R}^{HW\times S^{2}}

        ③Define a linear kernel \mathbf{W}_d\in\mathbb{R}^{S^2\times K^2}, and execute:

\mathbf{D}^\mathbf{g}=\mathrm{softmax}(\mathbf{A}^\mathbf{g}\mathbf{W}_d)\in\mathbb{R}^{HW\times K^2}

2.4.3. Network Architecture

        ①Variants of OverLoCK: Extreme-Tiny (XT), Tiny (T), Small (S), and Base (B) with variables channels, blocks, kernel sizes, and groups

2.5. Experiments

2.5.1. Image Classification

        ①Dataset: ImageNet-1k

        ②Optimizer: AdamW

        ③Stochastic depth rate: 0.1, 0.15, 0.4, and 0.5 for OverLoCK-XT, -T, -S, and -B models

        ④Image classification performance:

where #F and #P denote the FLOPs and number of Params of a model, respectively. #T refers to model type,where“C”, “T”, “M”, and “H” refer to ConvNet, Transformer, Mamba, and hybrid models

2.5.2. Object Detection and Instance Segmentation

        ①Dataset: COCO 2017

        ②Frameworks: both Mask R-CNN and Cascade Mask R-CNN

        ③Backbone is pretrained on ImageNet-1K and then fine tune on COCO

        ④Performance of object detection on Mask R-CNN framework:

        ⑤Performance of object detection on Cascade Mask R-CNN framework:

2.5.3. Semantic Segmentation

        ①Dataset: ADE20K

        ②Framework: UperNet

        ③Backbone is pretrained on ImageNet-1K and then fine tune on COCO

        ④Semantic segmentation performance on ADE20K:

2.5.4. Ablation Studies

        ①Module ablation:

        ②Module comparison:

2.6. Conclusion

        ~

Logo

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区,共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐