[TPAMI 2025]Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

①DERT, which based on Transformer, is faster and more accurate than YOLO. However, it has plenty of parameters and performs worse on small object detection

②Transformer is similar to graph（感觉有点小共识怎么回事）

③Hyper-graph is able to solve the problem of Transformer

2.3.3. Hypergraph Learning Methods

①超图捕获高阶关系然后超图在计算机视觉还没有充分探索哈哈哈哈哈哈哈哈

2.4. Hypergraph Computation Empowered Semantic Collecting and Scattering Framework

①For feature map $\mathbf{X}$ , hyper graph will construct it to $f:\boldsymbol{X}\to\mathcal{G}$ . Then get the hyper feature map $\mathbf{X}_{hyper}$ . $\mathbf{X}$ and $\mathbf{X}_{hyper}$ will be fused to construct the hybrid feature map $\mathbf{X}'$

②Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework:

$\begin{cases} \boldsymbol{X}_{mixed}\xleftarrow{\text{Collecting}}\{\boldsymbol{X}_{1},\boldsymbol{X}_{2},\ldots\} \\ \boldsymbol{X}_{hyper}=\text{HyperComputation}(\boldsymbol{X}_{mixed})//\text{High-Order} \\ \mathrm{Learning} \\ \{\boldsymbol{X}_{1}^{\prime},\boldsymbol{X}_{2}^{\prime},\ldots\}\xleftarrow{\text{Scattering}}\{\phi(\boldsymbol{X}_{hyper},\boldsymbol{X}_{1}),\phi(\boldsymbol{X}_{hyper},\boldsymbol{X}_{2}) \\ ,\ldots\} & \end{cases}$

where $\phi \left ( \cdot \right )$ denotes the feature fusion function

2.5. Methods

2.5.1. Preliminaries

①Three scale outputs of the neck: $\{N_3,N_4,N_5\}$ , which are small-scale, medium-scale, and large-scale feature map

②5 stages in backbone: $\{B_1,B_2,B_3,B_4,B_5\}$ , the higher number denotes the semantic feature at higher level and deeper layer

2.5.2. Hyper-YOLO Overview

①感觉把上一节的内容又说了一下，说自己在那些地方提取特征

2.5.3. Mixed Aggregation Network

①The schematic of Mixed Aggregation Network (MANet):

where $c$ in pictures denotes channel number

②The processes in MANet:

③The final output is fused by all of these feature:

$\boldsymbol{X}_{out}=\mathrm{Conv}_o(\boldsymbol{X}_1||\boldsymbol{X}_2||\ldots||\boldsymbol{X}_{4+n})$

prowess n.造诣；高超的技艺；非凡的技能

2.5.4. Hypergraph-Based Cross-Level and Cross-Position Representation Network

①Pipeline of proposed Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net):

（1）Hypergraph Construction

①For hypergraph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ , $\mathcal{V}$ denotes node set and $\mathcal{E}$ is hyperedge set

②How to build hypergraph:

③Edges are screened by $\epsilon$ -ball from each feature point:

$\mathcal{E}=\{ball(v,\epsilon)\mid v\in\mathcal{V}\}$

where $ball(v,\epsilon)=\{u\mid||\boldsymbol{x}_u-\boldsymbol{x}_v||_d<\epsilon,u\in\mathcal{V}\}$

（2）Hypergraph Convolution

①Hypergraph conv: spatial-domain hypergraph convolution with residual connection:

$\left.\left\{ \begin{array} {l}\boldsymbol{x}_e=\frac{1}{|\mathcal{N}_v(e)|}\sum_{v\in\mathcal{N}_v(e)}\boldsymbol{x}_v\boldsymbol{\Theta} \\ \boldsymbol{x}_v^{\prime}=\boldsymbol{x}_v+\frac{1}{|\mathcal{N}_e(v)|}\sum_{e\in\mathcal{N}_e(v)}\boldsymbol{x}_e \end{array}\right.\right.$

where $\mathcal{N}_v(e)=\{v\mid v\in e,v\in\mathcal{V}\}$ and $\mathcal{N}_e(v)=\{e\mid v\in e,e\in\mathcal{E}\}$ , where $\Theta$ is trainable parameter

②The fomular of hyper graph convolution:

$\mathrm{HyperConv}(\boldsymbol{X},\boldsymbol{H})=\boldsymbol{X}+\boldsymbol{D}_v^{-1}\boldsymbol{H}\boldsymbol{D}_e^{-1}\boldsymbol{H}^\top\boldsymbol{X}\boldsymbol{\Theta}$

where $\boldsymbol{D}_v$ and $\boldsymbol{D}_e$ denote diagonal degree matrices of the vertices and hyperedges

（3）An Instance of HGC-SCS Framework

①Hypergraph-based cross-level and cross-position representation network (HyperC2Net):

$\begin{cases} X_{mixed}=B_{1}||B_{2}||B_{3}||B_{4}||B_{5} \\ X_{hyper}=HyperConv(X_{mixed},H) \\ N_{3},N_{4},N_{5}=\phi(X_{hyper},B_{3}),\phi(X_{hyper},B_{4}), \\ \phi(X_{hyper},B_{4}) & \end{cases}$

where $\parallel$ denotes concatenation, $\phi$ denotes fusion function

2.5.5. Comparison and Analysis

①They change PANet/gather-distribute neck to HyperC2Net

2.6. Experiments

2.6.1. Experimental Setup

①Performance on Microsoft COCO dataset:

where different convolutional layers and feature dimension takes different model size, -T (the last C2F in Bottom-Up stage is changed to 1×1 Conv), -N, -S, -M, -L

②Fair comparison: no pretraining and self-distillation strategies for all methods

③Input of all these models: 640×640 pixels

2.6.2. Results and Discussions

①性能好，参数少，小参数模型上性能显著提升

2.6.3. Ablation Studies on Backbone

①Ablation studies on backbone:

②Ablation studies on kernel size:

2.6.4. Ablation Studies on Neck

①Change hypergraph to traditional GCN:

②Ablation on feature map:

③Ablation on distance threshold:

④Ablation on distance:

2.6.5. More Ablation Studies

①Model scale ablation:

2.6.6. More Evaluation on Instance Segmentation Task

①Performance on instance segmentation:

2.6.7. Visualization of High-Order Learning in Object Detection

①Attention changing visualization:

2.7. Conclusion

脑启社区

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区，共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐

YOLOv11【第四章：巅峰前沿与融合篇·第13节】生物计算与神经形态硬件：Spike 脉冲神经网络替换 YOLOv11！

脑启社区

RCX多架构支持揭秘：ARM、x86、64位设备的兼容性解决方案

RCX作为Android平台上的Rclone客户端，通过创新的多架构支持技术，实现了对ARM、x86和64位设备的全面兼容。本文将深入剖析RCX如何突破硬件限制，让不同架构的Android设备都能享受高效的云存储管理体验。## 多架构支持的核心价值在Android设备碎片化严重的今天，处理器架构的多样性给开发者带来了巨大挑战。RCX通过精细化的架构适配，确保从低端手机到高端平板的各类设备都