Python 深度学习实战 第9章 计算机视觉中的高级深度学习应用(图像分类 图像分割 目标检测)
通过本章的学习,读者将掌握计算机视觉中的高级深度学习技术,包括图像分类、图像分割和目标检测。通过理解现代卷积神经网络的架构模式和解释方法,读者将能够开发出更高性能的模型,并深入理解模型的决策过程。
Python 深度学习实战 第9章 计算机视觉中的高级深度学习应用(图像分类 图像分割 目标检测)
内容概要
第9章深入探讨了计算机视觉中的高级深度学习技术,包括图像分类、图像分割和目标检测等核心任务。本章还介绍了现代卷积神经网络(convnet)的架构模式,如残差连接、批量归一化和深度可分离卷积。通过本章,读者将掌握如何使用深度学习解决复杂的计算机视觉问题,并了解如何解释卷积神经网络的学习过程。
主要内容
-
计算机视觉的三个核心任务
- 图像分类:为图像分配一个或多个标签。
- 图像分割:将图像分割成不同的区域,每个区域通常代表一个类别。
- 目标检测:在图像中绘制边界框并关联类别。
-
现代卷积神经网络架构模式
- 残差连接:通过跳跃连接解决梯度消失问题,使网络能够训练更深。
- 批量归一化:通过归一化层间激活值,加速训练并提高模型性能。
- 深度可分离卷积:通过分离空间卷积和通道卷积,减少参数数量和计算量。
-
解释卷积神经网络的学习过程
- 中间激活可视化:显示不同卷积层的输出,理解网络如何逐步提取特征。
- 滤波器可视化:通过梯度上升生成特定滤波器的可视化图案。
- 类别激活热图:生成热图以显示图像中哪些部分对特定类别最重要。
关键代码和算法
1.1 图像分割示例
from tensorflow import keras
from tensorflow.keras import layers
def get_model(img_size, num_classes):
inputs = keras.Input(shape=img_size + (3,))
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(64, 3, strides=2, activation="relu", padding="same")(x)
x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
x = layers.Conv2D(128, 3, strides=2, activation="relu", padding="same")(x)
x = layers.Conv2D(128, 3, activation="relu", padding="same")(x)
x = layers.Conv2D(256, 3, strides=2, padding="same", activation="relu")(x)
x = layers.Conv2D(256, 3, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same", strides=2)(x)
x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same", strides=2)(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same")(x)
x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same", strides=2)(x)
outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)
model = keras.Model(inputs, outputs)
return model
model = get_model(img_size=(200, 200), num_classes=3)
model.summary()
1.2 残差连接
def residual_block(x, filters, pooling=False):
residual = x
x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
x = layers.Conv2D(filters, 3, activation="relu", padding="same")(x)
if pooling:
x = layers.MaxPooling2D(2, padding="same")(x)
residual = layers.Conv2D(filters, 1, strides=2)(residual)
elif filters != residual.shape[-1]:
residual = layers.Conv2D(filters, 1)(residual)
x = layers.add([x, residual])
return x
inputs = keras.Input(shape=(32, 32, 3))
x = layers.Rescaling(1./255)(inputs)
x = residual_block(x, filters=32, pooling=True)
x = residual_block(x, filters=64, pooling=True)
x = residual_block(x, filters=128, pooling=False)
x = layers.GlobalAveragePooling2D()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model.summary()
1.3 批量归一化
x = layers.Conv2D(32, 3, use_bias=False)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
1.4 深度可分离卷积
x = layers.SeparableConv2D(32, 3, activation="relu", padding="same")(x)
x = layers.BatchNormalization()(x)
x = layers.Activation("relu")(x)
1.5 中间激活可视化
from tensorflow.keras import layers
layer_outputs = []
layer_names = []
for layer in model.layers:
if isinstance(layer, (layers.Conv2D, layers.MaxPooling2D)):
layer_outputs.append(layer.output)
layer_names.append(layer.name)
activation_model = keras.Model(inputs=model.input, outputs=layer_outputs)
activations = activation_model.predict(img_tensor)
import matplotlib.pyplot as plt
plt.matshow(first_layer_activation[0, :, :, 5], cmap="viridis")
9.4.2 滤波器可视化
def compute_loss(image, filter_index):
activation = feature_extractor(image)
filter_activation = activation[:, 2:-2, 2:-2, filter_index]
return tf.reduce_mean(filter_activation)
@tf.function
def gradient_ascent_step(image, filter_index, learning_rate):
with tf.GradientTape() as tape:
tape.watch(image)
loss = compute_loss(image, filter_index)
grads = tape.gradient(loss, image)
grads = tf.math.l2_normalize(grads)
image += learning_rate * grads
return image
def generate_filter_pattern(filter_index):
iterations = 30
learning_rate = 10.
image = tf.random.uniform(minval=0.4, maxval=0.6, shape=(1, 200, 200, 3))
for i in range(iterations):
image = gradient_ascent_step(image, filter_index, learning_rate)
return image[0].numpy()
def deprocess_image(image):
image -= image.mean()
image /= image.std()
image *= 64
image += 128
image = np.clip(image, 0, 255).astype("uint8")
image = image[25:-25, 25:-25, :]
return image
plt.axis("off")
plt.imshow(deprocess_image(generate_filter_pattern(filter_index=2)))
1.6 类别激活热图
last_conv_layer_name = "block14_sepconv2_act"
classifier_layer_names = ["avg_pool", "predictions"]
last_conv_layer = model.get_layer(last_conv_layer_name)
last_conv_layer_model = keras.Model(model.inputs, last_conv_layer.output)
classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer_name in classifier_layer_names:
x = model.get_layer(layer_name)(x)
classifier_model = keras.Model(classifier_input, x)
with tf.GradientTape() as tape:
last_conv_layer_output = last_conv_layer_model(img_array)
tape.watch(last_conv_layer_output)
preds = classifier_model(last_conv_layer_output)
top_pred_index = tf.argmax(preds[0])
top_class_channel = preds[:, top_pred_index]
grads = tape.gradient(top_class_channel, last_conv_layer_output)
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)).numpy()
last_conv_layer_output = last_conv_layer_output.numpy()[0]
for i in range(pooled_grads.shape[-1]):
last_conv_layer_output[:, :, i] *= pooled_grads[i]
heatmap = np.mean(last_conv_layer_output, axis=-1)
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)
plt.matshow(heatmap)
import matplotlib.cm as cm
img = keras.utils.load_img(img_path)
img = keras.utils.img_to_array(img)
heatmap = np.uint8(255 * heatmap)
jet = cm.get_cmap("jet")
jet_colors = jet(np.arange(256))[:, :3]
jet_heatmap = jet_colors[heatmap]
jet_heatmap = keras.utils.array_to_img(jet_heatmap)
jet_heatmap = jet_heatmap.resize((img.shape[1], img.shape[0]))
jet_heatmap = keras.utils.img_to_array(jet_heatmap)
superimposed_img = jet_heatmap * 0.4 + img
superimposed_img = keras.utils.array_to_img(superimposed_img)
superimposed_img.save("elephant_cam.jpg")
精彩语录
-
中文:卷积神经网络是计算机视觉任务中最佳的深度学习模型类型。
英文原文:Convnets are the best type of machine learning models for computer vision tasks.
解释:这句话强调了卷积神经网络在计算机视觉中的重要性。 -
中文:深度学习模型的架构是成功的关键。
英文原文:Model architecture is often the difference between success and failure.
解释:这句话强调了模型架构在深度学习中的重要性。 -
中文:残差连接使训练更深的网络成为可能。
英文原文:Residual connections help you train deeper networks.
解释:这句话介绍了残差连接在解决梯度消失问题中的作用。 -
中文:批量归一化有助于加速训练并提高模型性能。
英文原文:Batch normalization helps accelerate training and improve model performance.
解释:这句话总结了批量归一化的主要优势。 -
中文:深度可分离卷积通过减少参数数量和计算量提高效率。
英文原文:Depthwise separable convolutions improve efficiency by reducing parameters and computations.
解释:这句话强调了深度可分离卷积在提高模型效率方面的作用。
总结
通过本章的学习,读者将掌握计算机视觉中的高级深度学习技术,包括图像分类、图像分割和目标检测。通过理解现代卷积神经网络的架构模式和解释方法,读者将能够开发出更高性能的模型,并深入理解模型的决策过程。
更多推荐

所有评论(0)