Reading notes: Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks

This blog is the reading note for the paper "Non-Local Context Encoder: Robust Biomedical Image Segmentation against Adversarial Attacks." by He, Xiang, et al., AAAI 2019. Broadly speaking, this paper deals with the problem of adversarial attacks on biomedical image segmentation model. Specifically, it proposes a non-local context encoder which can model short- and long-range spatial dependencies and encode global contexts to enhance the adversarial robustness.


Introduction


Medical image segmentation is a fundamental part of image analysis for computer-aided diagnosis, which offers pixel-level annotation for ROI such as organs, lesion, structures on the medical image (e.g. MRI, CT, X-ray). However, it is challenging due to the limited number and diversity of training dataset. With the development of the hardware, CNN-based methods such as UNet, FCN have achieved great success in medical segmentation. 


In parallel to this progress, recent work shows that semantic segmentation model is threatened by adversarial examples. Compare to other tasks such as natural image classification, the biomedical image segmentation does not have sufficient training examples, which lead to a week generalization capability of the model. This property makes the systems more sensitive to adversarial perturbation and results in more vulnerable toward attacks. Figure 1 shows some example of adversarial attacks on lung segmentation.


Figure 1. Sample adversarial attacks on four segmentation model. The adversarial perturbations are generated by the Iterative FGSM attack method. 

Overall Idea

The main idea is based on the assumption given by the authors:


"The robustness of biomedical image segmentation can be improved effectively by global spatial dependencies and global contextual information."

1. Global spatial dependencies: Give a pixel, capturing global spatial dependencies means predict this pixel will all highly related pixel in the entire image. Two perspectives for understanding the effectiveness of global dependencies are provided by authors. First, if a wrong label is given to the pixel, the increased loss will be pass to all related pixels by back-propagation, which increase the intensity of perturbation and make the attack example significantly different from the original image. Second, the perturbation added to the original image will be weakened by the fusion with related pixel during forward-propagation.


2. Global Contextual information: the global contextual information can enhance the adversarial robustness is because the configuration of the human body is relatively stable. For example, in lung segmentation, due to the association of right and left lung, the attacks must offer the same amount of perturbation of both right and left lung. Therefore the overall intensity of attacks may be doubled.

Unfortunately, most CNNs have difficulty in capturing global dependencies since convolution operation only processing one local neighborhood at a time.


 Inspired by the above analysis, the authors combine the non-local network [1] (global spatial dependencies) and context encoding module [2] (global contextual information) into the feature pyramid network [3] to improve the adversarial robustness of medical image segmentation model.

Details


Figure 2. The architecture of proposed non-local context encoder (NLCE)

Figure 2 shows the architecture of non-local context encoder (NLCE) which is a combination of non-local module (left) and context module (Right). The input is an H×W×C feature map. The feature map is then split in ot N=H×W of C-dimensional features X={x_1,…,x_N }. f leans a relationship between any two features x_i and x_j as:


Then the non-local response is defined as:
Where g is matrix transformation and C(x) is the normalization factor.  The non-local response y_i captures short- and long-range dependencies via considering all features in the above non-local operation. Next, the enhanced features z_i = W_z * y_i + x_i which combine the non-local response with original feature are fed in to next module.

For the global contextual information, the module learn a global code book D={d_1,…,d_K}, which contains K C''-dimensional codewords.  The code book represents global statistical information about the non-local enhanced features. The enhanced feature was transferred as same dimensionality as the codeworks, and result in the C’’-dimensional features are denoted as Z′={z'_1,…,z'_N}. The normalized residual e_ik is defined as:

Thus, the residual information for all features captured by the codeword d_k is defined as e_k= Sum_i(e_ik) and the global context is defined as e = sum_k(σ(e_k)), where σ denotes batch normalization. Finally the channel-wise scaling factor γ=sigmoid(W_γ e) and the output of the NLCE is  element-wise mutiplicaion of γ and enhanced feature F_z.


Figure 3: The overall architecture of the proposed non-local context encoding network (NLCEN)

As shown in the figure 3. The architecture of the global phase is based on the ResNet backbone and feature pyramid network. The fused information of low-level and high-level features can capture rich contextual information. An NLCE module is attached to the conv2 to conv5 to obtain multi-level robust non-local feature maps, denote as E. The fused feature map P is obtained by element-wise addition between E a up-sampled P.  Multi-level feature maps are fused together via up-sampling and concatenation, and the refined segmentation prediction is produced directly from the fused features.


The loss function for a single map prediction is defined as the sum of cross-entropy losses at individual pixels between the ground truth and the predicted segmentation map.  The loss for segmentation prediction obtained from P^2,…,P^5 as L_g^2,…,L_g^5, and the loss for the refined segmentation as L_r. The total loss is defined as:

Experiments

The datasets used for experiments are ISBI 2016 skin lesion segmentation and JSRT lung segmentation. The robustness of biomedical image segmentation method is measured by the drop in segmentation accuracy against the different level of attacks. Dice's coefficient and Jaccard similarity coefficient are used for segmentation performance.


Figure 4: Comparison of result in terms of DIC (left) and JSC (right) on the JSRT (upper) and ISBI (bottom) dataset.

As shown in figure 4, the adversarial attacks generated by FGSM have little effect on their lung segmentation model. For the skin lesion segmentation, since there is very little contextual information, their method is less robustness compared to lung but still outperforms others.

My Opinion

This paper proposes an adversarial robust module: non-local context encoder (NLCE), which can capture distance-independent dependencies and global contextual information and can be applied to other CNN based segmentation method. The extensive experiments show their non-local context encoding network, can achieve high segmentation accuracy and is robust on the adversarial sample with different level of attacks.

However here are some issues may be addressed in the future: 1. The NLCE module may lower the performance of the segmentation. 2. The attack method is simple and untargeted and will change the structure of the resulting mask which is quite different from the ground truth. I wonder if their method can handle more carefully created attacks. For example, attack only rotates the mask a little bit.

Reference

[1] Wang, Xiaolong, et al. "Non-local neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

[2] Zhang, Hang, et al. "Context encoding for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[3] Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

















Comments

Popular posts from this blog

Reading notes: Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation

Reading notes: Federated Learning with Only Positive Labels

Reading notes: Degenerative Adversarial NeuroImage Nets: Generating Images that Mimic Disease Progression