论文网址:MindPainter: Efficient Brain-Conditioned Painting of Natural Images via Cross-Modal Self-Supervised Learning | Proceedings of the AAAI Conference on Artificial Intelligence

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Method

2.4.1. Our Method

2.5. Experiment

2.5.1. Dataset and Implementation

2.5.2. Qualitative Illustration

2.5.3. Comparisons

2.5.4. Ablation Study

2.5.5. Effectiveness of Designed Modules

2.6. Conclusion

1. 心得

(1)不会真有人放暑假吧?

(2)任务确实特别,用脑信号修改/定制图片,类似于脑信号P图。难绷

2. 论文逐段精读

2.1. Abstract

        ①MindPainter aims to ahieve brain-conditioned image painting

2.2. Introduction

        ①Two steps of image editing, including reconstruction and prompt, is inefficient due to simple combination, large modality gap and limited representation ablity

2.3. Related Work

        ①Lists other generation model, and points out no one utilizes brain signal as supervision

2.4. Method

        ①The shape of input image: \mathbf{x}_{s}\in R^{H\times W\times3}, where H denotes height and W denotes width

        ②Edited region is \mathbf{m}\in\{0,1\}^{H\times W}, where 1 denotes editable position

        ③Brain condition: \mathbf{x}_{b}

        ④Goal: for input (\mathbf{x}_{s},\mathbf{x}_{b},\mathbf{m}), generate a image \mathbf{y} that all the region of \mathbf{m}=1 contains the semantics of \mathbf{x}_{b}

        ⑤Limitations of MindEye: a) inefficient in models stacking, b) decoding accuracy dependent, c) limited generation style

2.4.1. Our Method

        ①Over all pipeline:

        ②For input \mathbf{x}_{s}, they get masked \mathbf{\overline{m}}\odot\mathbf{x}_{s} and mask itself \mathbf{m}\odot\mathbf{x}_{s}

        ③The pseudo-brain condition \mathbf{x}_{b} is obtained by \mathbf{x}_{b}=\mathcal{G}_{\eta}(\mathbf{m}\odot\mathbf{x}_{s}) with Pseudo Brain Generator (PBG) \mathcal{G} and parameter \eta. PBG is constructed by several residual linear layers

        ④Input: paired \left ( B,I \right ) where B\in R^{N\times V} denotes brain signal with batch size of N and voxel of V, and I\in R^{N\times256\times256} denotes image

        ⑤Fed I to CLIP ViT/L-14, obtaining feature Z with the shape of N\times1024. The loss is:

\begin{aligned} Z_{i} & =CLIP_{Image}(I_{i}), \\ \mathcal{L}_{PBG} & =\sum_{i=1}^NL_{MSE}(\mathcal{G}(Z_i,\eta),B_i), \end{aligned}

        ⑥Brain Adapter (BA) is a MLP (residual linear layers), process real and simulated brain signals B and B^* to embedding E\in R^{N\times1024} and E^*\in R^{N\times1024}

        ⑦Calculate the similarity matrix between two embeddings and employ CLIP constrastive loss:

S=Z\cdot E^{\top}

S^{*}=Z\cdot E^{*\top}

\mathcal{L}_{\mathrm{real,contra}}=-\frac{1}{N}\sum_{i=1}^{N}\left[\log\frac{\exp(S_{ii}/\tau)}{\sum_{j=1}^{N}\exp(S_{ij}/\tau)}+\log\frac{\exp(S_{ii}/\tau)}{\sum_{j=1}^{N}\exp(S_{ji}/\tau)}\right]

\mathcal{L}_{\text{simul.contra}}=-\frac{1}{N}\sum_{i=1}^{N}\left[\log\frac{\exp(S_{ii}^{*}/\tau)}{\sum_{j=1}^{N}\exp(S_{ij}^{*}/\tau)}+\log\frac{\exp(S_{ii}^{*}/\tau)}{\sum_{j=1}^{N}\exp(S_{ji}^{*}/\tau)}\right]

\mathcal{L}_{\mathrm{contra}}=\mathcal{L}_{\mathrm{real_contra}}+\mathcal{L}_{\mathrm{simul_contra}}

        ⑧Loss of diffusion model:

\mathcal{L}_{t}^{\mathrm{cond}}=E_{x,\epsilon\sim\mathcal{N}(0,1),t}\left[\left\|\epsilon-\epsilon_{\theta}(x_{t},t,\mathcal{A}(B^{*},\sigma))\right\|_{2}^{2}\right]

        ⑨Probability of inpaint, outpaint and random masking: 0.5, 0.3, 0.2

2.5. Experiment

2.5.1. Dataset and Implementation

        ①Dataset: NSD for fMRI and image, OpenImages for image

        ②Test: random matching of 100 fMRI in NSD and 100 images in OpenImages

2.5.2. Qualitative Illustration

        ①Concatenation results:

        ②The same image meets different fMRI:

        ③When mask all of the image, MindPainter will reconstruct image only rely on fMRI:

2.5.3. Comparisons

        ①Comparison:

        ②Real evaluation on 1200 results of 20 reviewers (1 to 3, 3 is the best):

2.5.4. Ablation Study

        ①Ablation results:

2.5.5. Effectiveness of Designed Modules

        ①The effectiveness of pseudo-fmri:

2.6. Conclusion

        ~

Logo

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区,共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐