论文网址:BrainChat: Interactive Semantic Information Decoding from fMRI Using Large-Scale Vision-Language Pretrained Models | IEEE Conference Publication | IEEE Xplore

论文代码:GitHub - HuangWanqiu/BrainChat-Code: BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Method

2.4. Experiment

2.4.1. Implementation

2.4.2. fMRI Captioning Evaluation

2.4.3. fQA Evaluation

2.4.4. Adapting BrainChat without Image Data

2.5. Conclusion

3. Reference

1. 心得

(1)你就是我命定的paper吗?先不能如此下结论,很怀疑代码“key code”到底有没有我想要的东西

2. 论文逐段精读

2.1. Abstract

        ①Task: encode the semantic information of fMRI

2.2. Introduction

        ①Task of BrainChat:

        ②Integrated techniques: Contrastive Captioner (CoCa) and Masked Brain Modeling (MBM)

        ③Training mode: encoder/decoder pretraning and regression decoding

aphasia  n.失语症;失语(症)

2.3. Method

        ①Framework of BrainChat:

        ②Pre-training stage: pretraining encoder f_\theta and decoder by MBM. Patchify fMRI data and set masked patches to 0. Reconstruct masked data by mean squared error (MSE).

        ③The encodered fMRI data is project to align with image/text embedding extracted by frozen image encoder g_\theta and text encoder q_\theta from CoCa

        ④fMRI-image contrastive loss L_{fi}, fMRI-text contrastive loss L_{ft}:

L_{fi}=-(\underbrace{\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}g_{\boldsymbol{\theta}}(v_{i})/\sigma)}{\sum_{j=1}^{N}\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}g_{\boldsymbol{\theta}}(v_{j}))}}_{\mathrm{fMRI-to-imange}})+\underbrace{\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(g_{\boldsymbol{\theta}}(v_{i})^{T}p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))/\sigma)}{\sum_{j=1}^{N}\exp(g_{\boldsymbol{\theta}}(v_{i})^{T}f_{\boldsymbol{\theta}}(b_{j})/\sigma)}))}_{\mathrm{image-to-fMRI}}

L_{ft}=-(\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}q_{\boldsymbol{\theta}}(t_{i})/\sigma)}{\sum_{j=1}^{N}\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}q_{\boldsymbol{\theta}}(t_{j})/\sigma)})+\underbrace{\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(q_{\boldsymbol{\theta}}(t_{i})^{T}p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))/\sigma)}{\sum_{j=1}^{N}\exp(q_{\boldsymbol{\theta}}(t_{i})^{T}f_{\boldsymbol{\theta}}(b_{j})/\sigma)}))}_{\text{text-6-6MRI}}

where (b_{i},v_{i},t_{i}) denotes paired fMRI-images-text data, p_{\theta}(f_{\theta}(b_{i})),g_{\theta}(v_{i}) and p_{\theta}(t_{i}) is the embeddings of the fMRI, image and text in the i-th pair, N is batch size and \sigma is temperature parameter

        ⑤Caption loss L_{cap}:

L_{cap}=-\sum_{k=1}^T\log P_\theta\left(t_k|t<k,b\right)

where P_{\theta}(t_{k}|t<k,b) represents the probability of generating text t_k conditioned on text from previous time steps t< k and the fMRI data b

        ⑥Total loss:

L_{BrainChat}=\lambda_{fi}L_{fi}+\lambda_{ft}L_{ft}+\lambda_{\mathrm{Cap}}L_{cap}

where \lambdas are weights

        ⑦fMRI captioning task: input the first k text words in caption and predict text word at time k+1

        ⑧fQA task: the question is set as text encoder, e.g. "Question: What color is the water? Answer:"

        ⑨Caption generation (ignores image encoder and corresponding loss):

where greens are generated caption and reds are gramma error

2.4. Experiment

2.4.1. Implementation

        ①Dataset 1: subject 1 of Natural Scenes Dataset (NSD), including 15,724 voxel and captions from COCO for fMRI captioning. NSD and VQA datasets are combined to achieve fQA task

        ②Dataset 2: HCP for predict masked fMRI in pre-training stage

        ③Encder and decoder: ViT

        ④Mask ratio: 0.75

        ⑤Hyper parameters during pre-training: 5e-10 learning rate with 0.05 weight decay, AdamW with \beta _1=0.9 and \beta _2=0.95 with NativeScaler gradient scaling

        ⑥Hyper parameters during brain decoder: 1e-4 learning rate with 0.1 weight decay

        ⑦Loss weight: 20 and 1 for caption loss and constractive loss

2.4.2. fMRI Captioning Evaluation

        ①Captioning performance:

        ②Quantity measurement of fMRI captioning:

2.4.3. fQA Evaluation

        ①fQA performance:

        ②Results of fQA:

where greens are answers

2.4.4. Adapting BrainChat without Image Data

        ①没图像表现也ok

2.5. Conclusion

        ~

3. Reference

@INPROCEEDINGS{10889434,
  author={Huang, Wanqiu and Ma, Ke and Xie, Tingyu and Wang, Hongwei},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={BrainChat: Interactive Semantic Information Decoding from fMRI Using Large-Scale Vision-Language Pretrained Models}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Training;Semantics;Functional magnetic resonance imaging;Signal processing;Brain modeling;Question answering (information retrieval);Decoding;Data mining;Speech processing;Software development management;fMRI question answering;fMRI captioning;fMRI decoding;large-scale vision-language model;human-computer interaction},
  doi={10.1109/ICASSP49660.2025.10889434}}

Logo

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区,共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐