模型评估指标详解（准确率、召回率、F1、AUC等）

不同的评估指标适用于不同的场景，选择合适的评估指标对于模型的优化和应用至关重要。本文将详细介绍机器学习中最常用的评估指标，包括准确率、精确率、召回率、F1分数、AUC等，并通过代码示例帮助读者深入理解

海绵宝宝de派小星

123664人浏览 · 2026-01-21 22:45:00

海绵宝宝de派小星 · 2026-01-21 22:45:00 发布

文章目录

评估指标的重要性
混淆矩阵：评估的基础
准确率：最直观的指标
- 准确率的局限性
精确率：预测为正的有多少是对的
- 精确率的应用场景
召回率：实际为正的有多少被找出来了
- 召回率的应用场景
精确率与召回率的权衡
F1分数：精确率和召回率的调和平均
- F-beta分数：可调节的权衡
ROC曲线与AUC：全面评估分类器性能
- ROC曲线的基本概念
- AUC：ROC曲线下的面积
- ROC vs PR曲线
多分类问题的评估指标
- 宏平均与微平均
- 混淆矩阵的可视化
回归问题的评估指标
- 均方误差（MSE）
- 均方根误差（RMSE）
- 平均绝对误差（MAE）
- R²分数
如何选择评估指标
交叉验证：更可靠的评估
- K折交叉验证
- 分层K折交叉验证
实战案例：完整的模型评估流程
总结

在机器学习项目中，训练出一个模型只是第一步，更重要的是如何评估模型的性能。不同的评估指标适用于不同的场景，选择合适的评估指标对于模型的优化和应用至关重要。本文将详细介绍机器学习中最常用的评估指标，包括准确率、精确率、召回率、F1分数、AUC等，并通过代码示例帮助读者深入理解。

评估指标的重要性

在开始介绍具体的评估指标之前，我们先要理解为什么需要评估指标。

想象这样一个场景：你训练了一个用于检测信用卡欺诈的模型。在测试集中，有10000笔交易，其中只有10笔是欺诈交易。如果你的模型简单地将所有交易都预测为正常，那么准确率是99.9%，看起来非常好。但实际上，这个模型完全没有检测出任何欺诈交易，是一个完全失败的模型。

这个例子说明，单一的评估指标往往不足以全面评价模型的性能，我们需要从多个角度来评估模型。

混淆矩阵：评估的基础

在介绍各种评估指标之前，我们需要先理解混淆矩阵的概念。混淆矩阵是评估分类模型性能的基础工具。

对于二分类问题，混淆矩阵是一个2x2的矩阵：

具体来说：

真正例（True Positive, TP）：预测为正类，实际也是正类
假正例（False Positive, FP）：预测为正类，实际是负类（第一类错误）
假负例（False Negative, FN）：预测为负类，实际是正类（第二类错误）
真负例（True Negative, TN）：预测为负类，实际也是负类

下面用代码来实现混淆矩阵的计算和可视化：

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(y_true, y_pred, labels=['负类', '正类']):
    """绘制混淆矩阵"""
    cm = confusion_matrix(y_true, y_pred)
    
    plt.figure(figsize=(8, 6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=labels, yticklabels=labels)
    plt.xlabel('预测标签')
    plt.ylabel('真实标签')
    plt.title('混淆矩阵')
    
    # 添加TP, FP, FN, TN标注
    plt.text(0.5, 0.25, f'TN={cm[0,0]}', ha='center', va='center', 
             fontsize=12, color='darkblue')
    plt.text(1.5, 0.25, f'FP={cm[0,1]}', ha='center', va='center', 
             fontsize=12, color='darkred')
    plt.text(0.5, 1.25, f'FN={cm[1,0]}', ha='center', va='center', 
             fontsize=12, color='darkred')
    plt.text(1.5, 1.25, f'TP={cm[1,1]}', ha='center', va='center', 
             fontsize=12, color='darkblue')
    
    plt.show()
    return cm

# 示例数据
y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0])

cm = plot_confusion_matrix(y_true, y_pred)
print(f"混淆矩阵:\n{cm}")

准确率：最直观的指标

准确率是最直观、最常用的评估指标，它表示模型预测正确的样本占总样本的比例。

准确率的计算公式：

Accuracy = (TP + TN) / (TP + TN + FP + FN)

准确率的取值范围是[0, 1]，值越大表示模型性能越好。

def calculate_accuracy(y_true, y_pred):
    """计算准确率"""
    correct = np.sum(y_true == y_pred)
    total = len(y_true)
    accuracy = correct / total
    return accuracy

# 使用sklearn计算
from sklearn.metrics import accuracy_score

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0])

acc_manual = calculate_accuracy(y_true, y_pred)
acc_sklearn = accuracy_score(y_true, y_pred)

print(f"手动计算准确率: {acc_manual:.4f}")
print(f"sklearn计算准确率: {acc_sklearn:.4f}")

准确率的局限性

虽然准确率很直观，但它有一个严重的问题：在类别不平衡的数据集上，准确率会产生误导。

回到前面信用卡欺诈检测的例子：

总样本：10000笔交易
正常交易：9990笔
欺诈交易：10笔

如果模型将所有交易都预测为正常：

准确率 = 9990 / 10000 = 99.9%

看起来很高，但实际上模型完全没有检测出欺诈交易，是一个失败的模型。

因此，对于类别不平衡的问题，我们需要使用其他评估指标。

精确率：预测为正的有多少是对的

精确率也称为查准率，它回答的问题是：在所有被预测为正类的样本中，真正是正类的比例是多少？

精确率的计算公式：

Precision = TP / (TP + FP)

精确率关注的是预测为正类的准确性。当我们希望减少假正例时，应该关注精确率。

from sklearn.metrics import precision_score

def calculate_precision(y_true, y_pred):
    """计算精确率"""
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fp = np.sum((y_true == 0) & (y_pred == 1))
    
    if tp + fp == 0:
        return 0
    
    precision = tp / (tp + fp)
    return precision

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0])

prec_manual = calculate_precision(y_true, y_pred)
prec_sklearn = precision_score(y_true, y_pred)

print(f"手动计算精确率: {prec_manual:.4f}")
print(f"sklearn计算精确率: {prec_sklearn:.4f}")

精确率的应用场景

精确率适用于假正例代价高的场景：

垃圾邮件检测：不希望将正常邮件误判为垃圾邮件
推荐系统：不希望推荐用户不感兴趣的内容
医疗诊断：不希望将健康人误诊为患病

召回率：实际为正的有多少被找出来了

召回率也称为查全率或灵敏度，它回答的问题是：在所有真正是正类的样本中，被正确预测为正类的比例是多少？

召回率的计算公式：

Recall = TP / (TP + FN)

召回率关注的是对正类的覆盖程度。当我们希望减少假负例时，应该关注召回率。

from sklearn.metrics import recall_score

def calculate_recall(y_true, y_pred):
    """计算召回率"""
    tp = np.sum((y_true == 1) & (y_pred == 1))
    fn = np.sum((y_true == 1) & (y_pred == 0))
    
    if tp + fn == 0:
        return 0
    
    recall = tp / (tp + fn)
    return recall

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0])

rec_manual = calculate_recall(y_true, y_pred)
rec_sklearn = recall_score(y_true, y_pred)

print(f"手动计算召回率: {rec_manual:.4f}")
print(f"sklearn计算召回率: {rec_sklearn:.4f}")

召回率的应用场景

召回率适用于假负例代价高的场景：

疾病筛查：不希望漏诊患病的人
欺诈检测：不希望漏掉欺诈交易
安全检测：不希望漏掉安全威胁

精确率与召回率的权衡

精确率和召回率通常是相互制约的，提高一个往往会降低另一个。这种关系可以用下图来理解：

下面用代码演示精确率和召回率的权衡关系：

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt

# 生成数据
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                          n_redundant=5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 训练模型
model = LogisticRegression()
model.fit(X_train, y_train)

# 获取预测概率
y_scores = model.predict_proba(X_test)[:, 1]

# 计算不同阈值下的精确率和召回率
precisions, recalls, thresholds = precision_recall_curve(y_test, y_scores)

# 绘制精确率-召回率曲线
plt.figure(figsize=(10, 6))
plt.plot(thresholds, precisions[:-1], label='精确率', linewidth=2)
plt.plot(thresholds, recalls[:-1], label='召回率', linewidth=2)
plt.xlabel('分类阈值')
plt.ylabel('分数')
plt.title('精确率与召回率随阈值的变化')
plt.legend()
plt.grid(True)
plt.show()

# 绘制PR曲线
plt.figure(figsize=(8, 6))
plt.plot(recalls, precisions, linewidth=2)
plt.xlabel('召回率')
plt.ylabel('精确率')
plt.title('精确率-召回率曲线（PR曲线）')
plt.grid(True)
plt.show()

F1分数：精确率和召回率的调和平均

既然精确率和召回率都很重要，我们能不能用一个指标来综合考虑它们呢？F1分数就是为此而生的。

F1分数是精确率和召回率的调和平均数：

F1 = 2 * (Precision * Recall) / (Precision + Recall)

为什么用调和平均而不是算术平均？因为调和平均对较小的值更敏感。只有当精确率和召回率都很高时，F1分数才会高。

from sklearn.metrics import f1_score

def calculate_f1(y_true, y_pred):
    """计算F1分数"""
    precision = calculate_precision(y_true, y_pred)
    recall = calculate_recall(y_true, y_pred)
    
    if precision + recall == 0:
        return 0
    
    f1 = 2 * (precision * recall) / (precision + recall)
    return f1

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0])

f1_manual = calculate_f1(y_true, y_pred)
f1_sklearn = f1_score(y_true, y_pred)

print(f"手动计算F1分数: {f1_manual:.4f}")
print(f"sklearn计算F1分数: {f1_sklearn:.4f}")

F-beta分数：可调节的权衡

有时候我们可能更关注精确率或召回率中的某一个，这时可以使用F-beta分数：

F_β = (1 + β²) * (Precision * Recall) / (β² * Precision + Recall)

其中，β是一个权重参数：

β = 1：F1分数，精确率和召回率权重相同
β < 1：更关注精确率
β > 1：更关注召回率

from sklearn.metrics import fbeta_score

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 0, 0, 1, 0])

# 计算不同beta值的F分数
for beta in [0.5, 1, 2]:
    f_beta = fbeta_score(y_true, y_pred, beta=beta)
    print(f"F{beta}分数: {f_beta:.4f}")

ROC曲线与AUC：全面评估分类器性能

ROC曲线（Receiver Operating Characteristic Curve）是评估二分类模型性能的重要工具，它展示了在不同分类阈值下，真正例率和假正例率的关系。

ROC曲线的基本概念

ROC曲线的两个关键指标：

真正例率（True Positive Rate, TPR）：也就是召回率，TPR = TP / (TP + FN)
假正例率（False Positive Rate, FPR）：FPR = FP / (FP + TN)

ROC曲线以FPR为横轴，TPR为纵轴，展示了不同阈值下模型的表现。

AUC：ROC曲线下的面积

AUC（Area Under Curve）是ROC曲线下的面积，它提供了一个单一的数值来评估模型性能。

AUC的取值范围是[0, 1]：

AUC = 1：完美分类器
AUC = 0.5：随机猜测
AUC < 0.5：比随机猜测还差

AUC的物理意义：随机选择一个正样本和一个负样本，模型给正样本的预测概率大于负样本的概率。

from sklearn.metrics import roc_curve, roc_auc_score, auc
import matplotlib.pyplot as plt

# 生成数据并训练模型
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

# 获取预测概率
y_scores = model.predict_proba(X_test)[:, 1]

# 计算ROC曲线
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)

# 绘制ROC曲线
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, 
         label=f'ROC曲线 (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', 
         label='随机猜测 (AUC = 0.50)')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('假正例率 (FPR)')
plt.ylabel('真正例率 (TPR)')
plt.title('ROC曲线')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

print(f"AUC分数: {roc_auc:.4f}")
print(f"使用sklearn直接计算: {roc_auc_score(y_test, y_scores):.4f}")

ROC vs PR曲线

ROC曲线和PR曲线都可以评估模型性能，但它们适用于不同的场景：

ROC曲线：适用于类别平衡的数据集
PR曲线：适用于类别不平衡的数据集

在类别不平衡的情况下，PR曲线能更好地反映模型在正类上的表现。

from sklearn.metrics import average_precision_score

# 计算PR曲线下的面积（AP）
ap = average_precision_score(y_test, y_scores)
print(f"平均精确率 (AP): {ap:.4f}")

# 同时绘制ROC和PR曲线
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# ROC曲线
axes[0].plot(fpr, tpr, lw=2, label=f'AUC = {roc_auc:.2f}')
axes[0].plot([0, 1], [0, 1], 'k--', lw=2)
axes[0].set_xlabel('假正例率')
axes[0].set_ylabel('真正例率')
axes[0].set_title('ROC曲线')
axes[0].legend()
axes[0].grid(True)

# PR曲线
precisions, recalls, _ = precision_recall_curve(y_test, y_scores)
axes[1].plot(recalls, precisions, lw=2, label=f'AP = {ap:.2f}')
axes[1].set_xlabel('召回率')
axes[1].set_ylabel('精确率')
axes[1].set_title('PR曲线')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()

多分类问题的评估指标

前面介绍的指标主要针对二分类问题，对于多分类问题，我们需要对这些指标进行扩展。

宏平均与微平均

对于多分类问题，可以使用宏平均和微平均来计算整体的精确率、召回率和F1分数。

宏平均（Macro Average）：先计算每个类别的指标，然后取平均
微平均（Micro Average）：先计算所有类别的TP、FP、FN总和，然后计算指标

from sklearn.metrics import classification_report
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier

# 加载数据
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 训练模型
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# 打印分类报告
print("分类报告:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

# 计算不同平均方式的指标
from sklearn.metrics import precision_score, recall_score, f1_score

print("\n不同平均方式的对比:")
for average in ['macro', 'micro', 'weighted']:
    precision = precision_score(y_test, y_pred, average=average)
    recall = recall_score(y_test, y_pred, average=average)
    f1 = f1_score(y_test, y_pred, average=average)
    print(f"{average.capitalize()}平均 - 精确率: {precision:.4f}, "
          f"召回率: {recall:.4f}, F1: {f1:.4f}")

混淆矩阵的可视化

对于多分类问题，混淆矩阵可以清晰地展示模型在各个类别上的表现：

from sklearn.metrics import confusion_matrix
import seaborn as sns

# 计算混淆矩阵
cm = confusion_matrix(y_test, y_pred)

# 可视化
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris.target_names,
            yticklabels=iris.target_names)
plt.xlabel('预测标签')
plt.ylabel('真实标签')
plt.title('多分类混淆矩阵')
plt.show()

回归问题的评估指标

前面介绍的都是分类问题的评估指标，对于回归问题，我们需要使用不同的指标。

均方误差（MSE）

均方误差是最常用的回归评估指标，它计算预测值与真实值之间差的平方的平均值：

MSE = (1/n) * Σ(y_i - ŷ_i)²

MSE对异常值敏感，因为误差被平方了。

均方根误差（RMSE）

RMSE是MSE的平方根，它的单位与目标变量相同，更容易解释：

RMSE = √MSE

平均绝对误差（MAE）

MAE计算预测值与真实值之间差的绝对值的平均值：

MAE = (1/n) * Σ|y_i - ŷ_i|

MAE对异常值不如MSE敏感。

R²分数

R²分数（决定系数）表示模型解释了目标变量多少比例的方差：

R² = 1 - (SS_res / SS_tot)

其中，SS_res是残差平方和，SS_tot是总平方和。

R²的取值范围通常是[0, 1]，值越大表示模型拟合效果越好。

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.linear_model import LinearRegression
import numpy as np

# 生成回归数据
np.random.seed(42)
X = np.random.rand(100, 1) * 10
y = 2 * X + 1 + np.random.randn(100, 1) * 2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 训练模型
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# 计算各种指标
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("回归评估指标:")
print(f"MSE: {mse:.4f}")
print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")
print(f"R²: {r2:.4f}")

# 可视化预测结果
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
         'r--', lw=2, label='完美预测')
plt.xlabel('真实值')
plt.ylabel('预测值')
plt.title(f'回归预测结果 (R² = {r2:.4f})')
plt.legend()
plt.grid(True)
plt.show()

如何选择评估指标

面对众多的评估指标，如何选择合适的指标呢？下面给出一些建议：

具体建议：

分类问题且类别平衡：使用准确率
分类问题且类别不平衡：使用精确率、召回率、F1分数或AUC
需要排序能力：使用AUC
假正例代价高：关注精确率
假负例代价高：关注召回率
回归问题：使用MSE、RMSE、MAE或R²

交叉验证：更可靠的评估

单次划分训练集和测试集可能导致评估结果不稳定，交叉验证可以提供更可靠的评估。

K折交叉验证

K折交叉验证将数据分成K份，每次用K-1份训练，1份测试，重复K次，最后取平均值。

from sklearn.model_selection import cross_val_score, cross_validate
from sklearn.ensemble import RandomForestClassifier

# 加载数据
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# 创建模型
model = RandomForestClassifier(random_state=42)

# 5折交叉验证
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"5折交叉验证准确率: {scores}")
print(f"平均准确率: {scores.mean():.4f} (+/- {scores.std() * 2:.4f})")

# 同时计算多个指标
scoring = ['accuracy', 'precision', 'recall', 'f1', 'roc_auc']
cv_results = cross_validate(model, X, y, cv=5, scoring=scoring)

print("\n多指标交叉验证结果:")
for metric in scoring:
    scores = cv_results[f'test_{metric}']
    print(f"{metric}: {scores.mean():.4f} (+/- {scores.std() * 2:.4f})")

分层K折交叉验证

对于类别不平衡的数据，应该使用分层K折交叉验证，确保每折中各类别的比例与原始数据相同。

from sklearn.model_selection import StratifiedKFold

# 创建分层K折
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

scores = []
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    model = RandomForestClassifier(random_state=42)
    model.fit(X_train, y_train)
    score = model.score(X_test, y_test)
    scores.append(score)

print(f"分层5折交叉验证准确率: {np.mean(scores):.4f} (+/- {np.std(scores) * 2:.4f})")

实战案例：完整的模型评估流程

下面通过一个完整的案例，展示如何系统地评估一个模型：

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (classification_report, confusion_matrix, 
                             roc_auc_score, roc_curve, precision_recall_curve)
import matplotlib.pyplot as plt
import seaborn as sns

# 加载数据
data = load_breast_cancer()
X, y = data.data, data.target

# 划分数据集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 训练模型
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# 预测
y_pred = model.predict(X_test)
y_scores = model.predict_proba(X_test)[:, 1]

print("=" * 60)
print("模型评估报告")
print("=" * 60)

# 1. 基本指标
print("\n1. 分类报告:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

# 2. 混淆矩阵
print("\n2. 混淆矩阵:")
cm = confusion_matrix(y_test, y_pred)
print(cm)

# 3. AUC分数
auc_score = roc_auc_score(y_test, y_scores)
print(f"\n3. AUC分数: {auc_score:.4f}")

# 4. 交叉验证
cv_scores = cross_val_score(model, X, y, cv=5, scoring='roc_auc')
print(f"\n4. 5折交叉验证AUC: {cv_scores.mean():.4f} (+/- {cv_scores.std() * 2:.4f})")

# 可视化
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 混淆矩阵
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0, 0],
            xticklabels=data.target_names, yticklabels=data.target_names)
axes[0, 0].set_xlabel('预测标签')
axes[0, 0].set_ylabel('真实标签')
axes[0, 0].set_title('混淆矩阵')

# ROC曲线
fpr, tpr, _ = roc_curve(y_test, y_scores)
axes[0, 1].plot(fpr, tpr, lw=2, label=f'AUC = {auc_score:.2f}')
axes[0, 1].plot([0, 1], [0, 1], 'k--', lw=2)
axes[0, 1].set_xlabel('假正例率')
axes[0, 1].set_ylabel('真正例率')
axes[0, 1].set_title('ROC曲线')
axes[0, 1].legend()
axes[0, 1].grid(True)

# PR曲线
precision, recall, _ = precision_recall_curve(y_test, y_scores)
axes[1, 0].plot(recall, precision, lw=2)
axes[1, 0].set_xlabel('召回率')
axes[1, 0].set_ylabel('精确率')
axes[1, 0].set_title('PR曲线')
axes[1, 0].grid(True)

# 特征重要性
feature_importance = model.feature_importances_
indices = np.argsort(feature_importance)[-10:]  # 取前10个重要特征
axes[1, 1].barh(range(len(indices)), feature_importance[indices])
axes[1, 1].set_yticks(range(len(indices)))
axes[1, 1].set_yticklabels([data.feature_names[i] for i in indices])
axes[1, 1].set_xlabel('重要性')
axes[1, 1].set_title('Top 10 特征重要性')

plt.tight_layout()
plt.show()