Python 深度学习实战第9章计算机视觉中的高级深度学习应用(图像分类图像分割目标检测）

通过本章的学习，读者将掌握计算机视觉中的高级深度学习技术，包括图像分类、图像分割和目标检测。通过理解现代卷积神经网络的架构模式和解释方法，读者将能够开发出更高性能的模型，并深入理解模型的决策过程。

qq_26226783

1281人浏览 · 2025-04-18 09:00:00

qq_26226783 · 2025-04-18 09:00:00 发布

Python 深度学习实战第9章计算机视觉中的高级深度学习应用(图像分类图像分割目标检测）

内容概要

第9章深入探讨了计算机视觉中的高级深度学习技术，包括图像分类、图像分割和目标检测等核心任务。本章还介绍了现代卷积神经网络（convnet）的架构模式，如残差连接、批量归一化和深度可分离卷积。通过本章，读者将掌握如何使用深度学习解决复杂的计算机视觉问题，并了解如何解释卷积神经网络的学习过程。
在这里插入图片描述

主要内容

计算机视觉的三个核心任务
- 图像分类：为图像分配一个或多个标签。
- 图像分割：将图像分割成不同的区域，每个区域通常代表一个类别。
- 目标检测：在图像中绘制边界框并关联类别。
现代卷积神经网络架构模式
- 残差连接：通过跳跃连接解决梯度消失问题，使网络能够训练更深。
- 批量归一化：通过归一化层间激活值，加速训练并提高模型性能。
- 深度可分离卷积：通过分离空间卷积和通道卷积，减少参数数量和计算量。
解释卷积神经网络的学习过程
- 中间激活可视化：显示不同卷积层的输出，理解网络如何逐步提取特征。
- 滤波器可视化：通过梯度上升生成特定滤波器的可视化图案。
- 类别激活热图：生成热图以显示图像中哪些部分对特定类别最重要。

关键代码和算法

1.1 图像分割示例

from tensorflow import keras
from tensorflow.keras import layers

def get_model(img_size, num_classes):
    inputs = keras.Input(shape=img_size + (3,))
    x = layers.Rescaling(1./255)(inputs)
    x = layers.Conv2D(64, 3, strides=2, activation="relu", padding="same")(x)
    x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(128, 3, strides=2, activation="relu", padding="same")(x)
    x = layers.Conv2D(128, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(256, 3, strides=2, padding="same", activation="relu")(x)
    x = layers.Conv2D(256, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same", strides=2)(x)
    x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same", strides=2)(x)
    x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same", strides=2)(x)
    outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)
    model = keras.Model(inputs, outputs)
    return model

model = get_model(img_size=(200, 200), num_classes=3)
model.summary()

1.2 残差连接

def residual_block(x, filters, pooling=False):
    residual = x
    x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
    if pooling:
        x = layers.MaxPooling2D(2, padding="same")(x)
        residual = layers.Conv2D(filters, 1, strides=2)(residual)
    elif filters != residual.shape[-1]:
        residual = layers.Conv2D(filters, 1)(residual)
    x = layers.add([x, residual])
    return x

inputs = keras.Input(shape=(32, 32, 3))
x = layers.Rescaling(1./255)(inputs)
x = residual_block(x, filters=32, pooling=True)
x = residual_block(x, filters=64, pooling=True)
x = residual_block(x, filters=128, pooling=False)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()

1.3 批量归一化

x = layers.Conv2D(32, 3, use_bias=False)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)

1.4 深度可分离卷积

x = layers.SeparableConv2D(32, 3, activation="relu", padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)

1.5 中间激活可视化

from tensorflow.keras import layers

layer_outputs = []
layer_names = []
for layer in model.layers:
    if isinstance(layer, (layers.Conv2D, layers.MaxPooling2D)):
        layer_outputs.append(layer.output)
        layer_names.append(layer.name)
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(img_tensor)

import matplotlib.pyplot as plt
plt.matshow(first_layer_activation[0, :, :, 5], cmap="viridis")

9.4.2 滤波器可视化

def compute_loss(image, filter_index):
    activation = feature_extractor(image)
    filter_activation = activation[:, 2:-2, 2:-2, filter_index]
    return tf.reduce_mean(filter_activation)

@tf.function
def gradient_ascent_step(image, filter_index, learning_rate):
    with tf.GradientTape() as tape:
        tape.watch(image)
        loss = compute_loss(image, filter_index)
    grads = tape.gradient(loss, image)
    grads = tf.math.l2_normalize(grads)
    image += learning_rate * grads
    return image

def generate_filter_pattern(filter_index):
    iterations = 30
    learning_rate = 10.
    image = tf.random.uniform(minval=0.4, maxval=0.6, shape=(1, 200, 200, 3))
    for i in range(iterations):
        image = gradient_ascent_step(image, filter_index, learning_rate)
    return image[0].numpy()

def deprocess_image(image):
    image -= image.mean()
    image /= image.std()
    image *= 64
    image += 128
    image = np.clip(image, 0, 255).astype("uint8")
    image = image[25:-25, 25:-25, :]
    return image

plt.axis("off")
plt.imshow(deprocess_image(generate_filter_pattern(filter_index=2)))

1.6 类别激活热图

last_conv_layer_name = "block14_sepconv2_act"
classifier_layer_names = ["avg_pool", "predictions"]
last_conv_layer = model.get_layer(last_conv_layer_name)
last_conv_layer_model = keras.Model(model.inputs, last_conv_layer.output)

classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer_name in classifier_layer_names:
    x = model.get_layer(layer_name)(x)
classifier_model = keras.Model(classifier_input, x)

with tf.GradientTape() as tape:
    last_conv_layer_output = last_conv_layer_model(img_array)
    tape.watch(last_conv_layer_output)
    preds = classifier_model(last_conv_layer_output)
    top_pred_index = tf.argmax(preds[0])
    top_class_channel = preds[:, top_pred_index]
grads = tape.gradient(top_class_channel, last_conv_layer_output)

pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)).numpy()
last_conv_layer_output = last_conv_layer_output.numpy()[0]
for i in range(pooled_grads.shape[-1]):
    last_conv_layer_output[:, :, i] *= pooled_grads[i]
heatmap = np.mean(last_conv_layer_output, axis=-1)

heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)

import matplotlib.cm as cm
img = keras.utils.load_img(img_path)
img = keras.utils.img_to_array(img)
heatmap = np.uint8(255 * heatmap)
jet = cm.get_cmap("jet")
jet_colors = jet(np.arange(256))[:, :3]
jet_heatmap = jet_colors[heatmap]
jet_heatmap = keras.utils.array_to_img(jet_heatmap)
jet_heatmap = jet_heatmap.resize((img.shape[1], img.shape[0]))
jet_heatmap = keras.utils.img_to_array(jet_heatmap)
superimposed_img = jet_heatmap * 0.4 + img
superimposed_img = keras.utils.array_to_img(superimposed_img)
superimposed_img.save("elephant_cam.jpg")

精彩语录

中文：卷积神经网络是计算机视觉任务中最佳的深度学习模型类型。
英文原文：Convnets are the best type of machine learning models for computer vision tasks.
解释：这句话强调了卷积神经网络在计算机视觉中的重要性。
中文：深度学习模型的架构是成功的关键。
英文原文：Model architecture is often the difference between success and failure.
解释：这句话强调了模型架构在深度学习中的重要性。
中文：残差连接使训练更深的网络成为可能。
英文原文：Residual connections help you train deeper networks.
解释：这句话介绍了残差连接在解决梯度消失问题中的作用。
中文：批量归一化有助于加速训练并提高模型性能。
英文原文：Batch normalization helps accelerate training and improve model performance.
解释：这句话总结了批量归一化的主要优势。
中文：深度可分离卷积通过减少参数数量和计算量提高效率。
英文原文：Depthwise separable convolutions improve efficiency by reducing parameters and computations.
解释：这句话强调了深度可分离卷积在提高模型效率方面的作用。

总结

脑启社区

脑启社区是一个专注类脑智能领域的开发者社区。欢迎加入社区，共建类脑智能生态。社区为开发者提供了丰富的开源类脑工具软件、类脑算法模型及数据集、类脑知识库、类脑技术培训课程以及类脑应用案例等资源。

更多推荐

突破 Transformer 极限：一文看懂类脑架构 MT-LNN 最新的“超神”评测结果！

脑启社区

人工智能导论：模型与算法（未来发展与趋势）

人工智能作为引领新一轮科技革命和产业变革的战略性技术，正在深刻改变人类社会。本章从类脑计算、自动化机器学习、神经网络压缩、人工智能芯片、量子机器学习、人工智能伦理与治理、人工智能算法开发框架等方面，简要总结人工智能的未来发展方向和趋势。

脑启社区

CNSH通用翻译引擎 | 全语言互译+AI鉴定+来源追溯

《CNSH通用翻译引擎v1.0》摘要：该神经网络式翻译系统采用类脑架构设计，核心包含智能路由中枢（决策前额叶）和模块神经网络。具备多语言互译、AI伪代码识别、代码溯源三大功能，支持动态路径调整和双向反馈学习。系统通过特征感知、智能路由、并行处理实现高效翻译，并采用DNA追溯和三色审计确保可靠性。相比传统流水线架构，新设计具有神经网络的自适应优势，各模块可互相激活协作，实现更接近人类思维的翻译过程。