Python 深度学习实战 第9章 计算机视觉中的高级深度学习应用(图像分类 图像分割 目标检测)

内容概要

第9章深入探讨了计算机视觉中的高级深度学习技术,包括图像分类、图像分割和目标检测等核心任务。本章还介绍了现代卷积神经网络(convnet)的架构模式,如残差连接、批量归一化和深度可分离卷积。通过本章,读者将掌握如何使用深度学习解决复杂的计算机视觉问题,并了解如何解释卷积神经网络的学习过程。
在这里插入图片描述

主要内容

  1. 计算机视觉的三个核心任务

    • 图像分类:为图像分配一个或多个标签。
    • 图像分割:将图像分割成不同的区域,每个区域通常代表一个类别。
    • 目标检测:在图像中绘制边界框并关联类别。
  2. 现代卷积神经网络架构模式

    • 残差连接:通过跳跃连接解决梯度消失问题,使网络能够训练更深。
    • 批量归一化:通过归一化层间激活值,加速训练并提高模型性能。
    • 深度可分离卷积:通过分离空间卷积和通道卷积,减少参数数量和计算量。
  3. 解释卷积神经网络的学习过程

    • 中间激活可视化:显示不同卷积层的输出,理解网络如何逐步提取特征。
    • 滤波器可视化:通过梯度上升生成特定滤波器的可视化图案。
    • 类别激活热图:生成热图以显示图像中哪些部分对特定类别最重要。

关键代码和算法

1.1 图像分割示例

from tensorflow import keras
from tensorflow.keras import layers

def get_model(img_size, num_classes):
    inputs = keras.Input(shape=img_size + (3,))
    x = layers.Rescaling(1./255)(inputs)
    x = layers.Conv2D(64, 3, strides=2, activation="relu", padding="same")(x)
    x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(128, 3, strides=2, activation="relu", padding="same")(x)
    x = layers.Conv2D(128, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(256, 3, strides=2, padding="same", activation="relu")(x)
    x = layers.Conv2D(256, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same", strides=2)(x)
    x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same", strides=2)(x)
    x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same", strides=2)(x)
    outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)
    model = keras.Model(inputs, outputs)
    return model

model = get_model(img_size=(200, 200), num_classes=3)
model.summary()

1.2 残差连接

def residual_block(x, filters, pooling=False):
    residual = x
    x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
    if pooling:
        x = layers.MaxPooling2D(2, padding="same")(x)
        residual = layers.Conv2D(filters, 1, strides=2)(residual)
    elif filters != residual.shape[-1]:
        residual = layers.Conv2D(filters, 1)(residual)
    x = layers.add([x, residual])
    return x

inputs = keras.Input(shape=(32, 32, 3))
x = layers.Rescaling(1./255)(inputs)
x = residual_block(x, filters=32, pooling=True)
x = residual_block(x, filters=64, pooling=True)
x = residual_block(x, filters=128, pooling=False)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

1.3 批量归一化

x = layers.Conv2D(32, 3, use_bias=False)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)

1.4 深度可分离卷积

x = layers.SeparableConv2D(32, 3, activation="relu", padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)

1.5 中间激活可视化

from tensorflow.keras import layers

layer_outputs = []
layer_names = []
for layer in model.layers:
    if isinstance(layer, (layers.Conv2D, layers.MaxPooling2D)):
        layer_outputs.append(layer.output)
        layer_names.append(layer.name)
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(img_tensor)

import matplotlib.pyplot as plt
plt.matshow(first_layer_activation[0, :, :, 5], cmap="viridis")

9.4.2 滤波器可视化

def compute_loss(image, filter_index):
    activation = feature_extractor(image)
    filter_activation = activation[:, 2:-2, 2:-2, filter_index]
    return tf.reduce_mean(filter_activation)

@tf.function
def gradient_ascent_step(image, filter_index, learning_rate):
    with tf.GradientTape() as tape:
        tape.watch(image)
        loss = compute_loss(image, filter_index)
    grads = tape.gradient(loss, image)
    grads = tf.math.l2_normalize(grads)
    image += learning_rate * grads
    return image

def generate_filter_pattern(filter_index):
    iterations = 30
    learning_rate = 10.
    image = tf.random.uniform(minval=0.4, maxval=0.6, shape=(1, 200, 200, 3))
    for i in range(iterations):
        image = gradient_ascent_step(image, filter_index, learning_rate)
    return image[0].numpy()

def deprocess_image(image):
    image -= image.mean()
    image /= image.std()
    image *= 64
    image += 128
    image = np.clip(image, 0, 255).astype("uint8")
    image = image[25:-25, 25:-25, :]
    return image

plt.axis("off")
plt.imshow(deprocess_image(generate_filter_pattern(filter_index=2)))

1.6 类别激活热图

last_conv_layer_name = "block14_sepconv2_act"
classifier_layer_names = ["avg_pool", "predictions"]
last_conv_layer = model.get_layer(last_conv_layer_name)
last_conv_layer_model = keras.Model(model.inputs, last_conv_layer.output)

classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer_name in classifier_layer_names:
    x = model.get_layer(layer_name)(x)
classifier_model = keras.Model(classifier_input, x)

with tf.GradientTape() as tape:
    last_conv_layer_output = last_conv_layer_model(img_array)
    tape.watch(last_conv_layer_output)
    preds = classifier_model(last_conv_layer_output)
    top_pred_index = tf.argmax(preds[0])
    top_class_channel = preds[:, top_pred_index]
grads = tape.gradient(top_class_channel, last_conv_layer_output)

pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)).numpy()
last_conv_layer_output = last_conv_layer_output.numpy()[0]
for i in range(pooled_grads.shape[-1]):
    last_conv_layer_output[:, :, i] *= pooled_grads[i]
heatmap = np.mean(last_conv_layer_output, axis=-1)

heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)

import matplotlib.cm as cm
img = keras.utils.load_img(img_path)
img = keras.utils.img_to_array(img)
heatmap = np.uint8(255 * heatmap)
jet = cm.get_cmap("jet")
jet_colors = jet(np.arange(256))[:, :3]
jet_heatmap = jet_colors[heatmap]
jet_heatmap = keras.utils.array_to_img(jet_heatmap)
jet_heatmap = jet_heatmap.resize((img.shape[1], img.shape[0]))
jet_heatmap = keras.utils.img_to_array(jet_heatmap)
superimposed_img = jet_heatmap * 0.4 + img
superimposed_img = keras.utils.array_to_img(superimposed_img)
superimposed_img.save("elephant_cam.jpg")

精彩语录

  1. 中文:卷积神经网络是计算机视觉任务中最佳的深度学习模型类型。
    英文原文:Convnets are the best type of machine learning models for computer vision tasks.
    解释:这句话强调了卷积神经网络在计算机视觉中的重要性。

  2. 中文:深度学习模型的架构是成功的关键。
    英文原文:Model architecture is often the difference between success and failure.
    解释:这句话强调了模型架构在深度学习中的重要性。

  3. 中文:残差连接使训练更深的网络成为可能。
    英文原文:Residual connections help you train deeper networks.
    解释:这句话介绍了残差连接在解决梯度消失问题中的作用。

  4. 中文:批量归一化有助于加速训练并提高模型性能。
    英文原文:Batch normalization helps accelerate training and improve model performance.
    解释:这句话总结了批量归一化的主要优势。

  5. 中文:深度可分离卷积通过减少参数数量和计算量提高效率。
    英文原文:Depthwise separable convolutions improve efficiency by reducing parameters and computations.
    解释:这句话强调了深度可分离卷积在提高模型效率方面的作用。

总结

通过本章的学习,读者将掌握计算机视觉中的高级深度学习技术,包括图像分类、图像分割和目标检测。通过理解现代卷积神经网络的架构模式和解释方法,读者将能够开发出更高性能的模型,并深入理解模型的决策过程。

Logo

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区,共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐