论文网址:Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation | IEEE Journals & Magazine | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. YOLO Series Object Detectors

2.3.2. DETR Series Object Detectors

2.3.3. Hypergraph Learning Methods

2.4. Hypergraph Computation Empowered Semantic Collecting and Scattering Framework

2.5. Methods

2.5.1. Preliminaries

2.5.2. Hyper-YOLO Overview

2.5.3. Mixed Aggregation Network

2.5.4. Hypergraph-Based Cross-Level and Cross-Position Representation Network

2.5.5. Comparison and Analysis

2.6. Experiments

2.6.1. Experimental Setup

2.6.2. Results and Discussions

2.6.3. Ablation Studies on Backbone

2.6.4. Ablation Studies on Neck

2.6.5. More Ablation Studies

2.6.6. More Evaluation on Instance Segmentation Task

2.6.7. Visualization of High-Order Learning in Object Detection

2.7. Conclusion

1. 心得

(1)省流,清华佬的

(2)重要的图片补药放在补充材料里面啊我们看不到!!!

好想看wwwwww

2. 论文逐段精读

2.1. Abstract

        ①Limitations of traditional YOLO: neck can not efficiently aggregate cross-level feature or utilize the correlation of high order features

        ②Thus, they proposed Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS)

2.2. Introduction

        ①Most of existing works fail to explore the high order relationship between features

        ②Performance:

2.3. Related Work

2.3.1. YOLO Series Object Detectors

        ①List different versions of YOLO and mention that they proposed an improved version

2.3.2. DETR Series Object Detectors

        ①DERT, which based on Transformer, is faster and more accurate than YOLO. However, it has plenty of parameters and performs worse on small object detection

        ②Transformer is similar to graph(感觉有点小共识怎么回事)

        ③Hyper-graph is able to solve the problem of Transformer

2.3.3. Hypergraph Learning Methods

        ①超图捕获高阶关系然后超图在计算机视觉还没有充分探索哈哈哈哈哈哈哈哈

2.4. Hypergraph Computation Empowered Semantic Collecting and Scattering Framework

        ①For feature map \mathbf{X}, hyper graph will construct it to f:\boldsymbol{X}\to\mathcal{G}. Then get the hyper feature map \mathbf{X}_{hyper}\mathbf{X} and \mathbf{X}_{hyper} will be fused to construct the hybrid feature map \mathbf{X}'

        ②Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework:

\begin{cases} \boldsymbol{X}_{mixed}\xleftarrow{\text{Collecting}}\{\boldsymbol{X}_{1},\boldsymbol{X}_{2},\ldots\} \\ \boldsymbol{X}_{hyper}=\text{HyperComputation}(\boldsymbol{X}_{mixed})//\text{High-Order} \\ \mathrm{Learning} \\ \{\boldsymbol{X}_{1}^{\prime},\boldsymbol{X}_{2}^{\prime},\ldots\}\xleftarrow{\text{Scattering}}\{\phi(\boldsymbol{X}_{hyper},\boldsymbol{X}_{1}),\phi(\boldsymbol{X}_{hyper},\boldsymbol{X}_{2}) \\ ,\ldots\} & \end{cases}

where \phi \left ( \cdot \right ) denotes the feature fusion function

2.5. Methods

2.5.1. Preliminaries

        ①Three scale outputs of the neck: \{N_3,N_4,N_5\}, which are small-scale, medium-scale, and large-scale feature map

        ②5 stages in backbone: \{B_1,B_2,B_3,B_4,B_5\}, the higher number denotes the semantic feature at higher level and deeper layer

2.5.2. Hyper-YOLO Overview

        ①感觉把上一节的内容又说了一下,说自己在那些地方提取特征

2.5.3. Mixed Aggregation Network

        ①The schematic of Mixed Aggregation Network (MANet):

where c in pictures denotes channel number

        ②The processes in MANet:

        ③The final output is fused by all of these feature:

\boldsymbol{X}_{out}=\mathrm{Conv}_o(\boldsymbol{X}_1||\boldsymbol{X}_2||\ldots||\boldsymbol{X}_{4+n})

prowess  n.造诣;高超的技艺;非凡的技能

2.5.4. Hypergraph-Based Cross-Level and Cross-Position Representation Network

        ①Pipeline of proposed Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net):

(1)Hypergraph Construction

        ①For hypergraph \mathcal{G}=\{\mathcal{V},\mathcal{E}\}\mathcal{V} denotes node set and \mathcal{E} is hyperedge set

        ②How to build hypergraph:

        ③Edges are screened by \epsilon-ball from each feature point:

\mathcal{E}=\{ball(v,\epsilon)\mid v\in\mathcal{V}\}

where ball(v,\epsilon)=\{u\mid||\boldsymbol{x}_u-\boldsymbol{x}_v||_d<\epsilon,u\in\mathcal{V}\}

(2)Hypergraph Convolution

        ①Hypergraph conv: spatial-domain hypergraph convolution with residual connection:

\left.\left\{ \begin{array} {l}\boldsymbol{x}_e=\frac{1}{|\mathcal{N}_v(e)|}\sum_{v\in\mathcal{N}_v(e)}\boldsymbol{x}_v\boldsymbol{\Theta} \\ \boldsymbol{x}_v^{\prime}=\boldsymbol{x}_v+\frac{1}{|\mathcal{N}_e(v)|}\sum_{e\in\mathcal{N}_e(v)}\boldsymbol{x}_e \end{array}\right.\right.

where \mathcal{N}_v(e)=\{v\mid v\in e,v\in\mathcal{V}\} and \mathcal{N}_e(v)=\{e\mid v\in e,e\in\mathcal{E}\}, where \Theta is trainable parameter

        ②The fomular of hyper graph convolution:

\mathrm{HyperConv}(\boldsymbol{X},\boldsymbol{H})=\boldsymbol{X}+\boldsymbol{D}_v^{-1}\boldsymbol{H}\boldsymbol{D}_e^{-1}\boldsymbol{H}^\top\boldsymbol{X}\boldsymbol{\Theta}

where \boldsymbol{D}_v and \boldsymbol{D}_e denote diagonal degree matrices of the vertices and hyperedges

(3)An Instance of HGC-SCS Framework

        ①Hypergraph-based cross-level and cross-position representation network (HyperC2Net):

\begin{cases} X_{mixed}=B_{1}||B_{2}||B_{3}||B_{4}||B_{5} \\ X_{hyper}=HyperConv(X_{mixed},H) \\ N_{3},N_{4},N_{5}=\phi(X_{hyper},B_{3}),\phi(X_{hyper},B_{4}), \\ \phi(X_{hyper},B_{4}) & \end{cases}

where \parallel denotes concatenation, \phi denotes fusion function

2.5.5. Comparison and Analysis

        ①They change PANet/gather-distribute neck to HyperC2Net

2.6. Experiments

2.6.1. Experimental Setup

        ①Performance on Microsoft COCO dataset:

where different convolutional layers and feature dimension takes different model size, -T (the last C2F in Bottom-Up stage is changed to 1×1 Conv), -N, -S, -M, -L

        ②Fair comparison: no pretraining and self-distillation strategies for all methods

        ③Input of all these models: 640×640 pixels

2.6.2. Results and Discussions

        ①性能好,参数少,小参数模型上性能显著提升

2.6.3. Ablation Studies on Backbone

        ①Ablation studies on backbone:

        ②Ablation studies on kernel size:

2.6.4. Ablation Studies on Neck

        ①Change hypergraph to traditional GCN:

        ②Ablation on feature map:

        ③Ablation on distance threshold:

        ④Ablation on distance:

2.6.5. More Ablation Studies

        ①Model scale ablation:

2.6.6. More Evaluation on Instance Segmentation Task

        ①Performance on instance segmentation:

2.6.7. Visualization of High-Order Learning in Object Detection

        ①Attention changing visualization:

2.7. Conclusion

        ~

Logo

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区,共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐