ML——AUC和GAUC

本文介绍AUC和GAUC


参考链接

编程实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def calculate_auc(ground_truth, predictions):
# 将预测结果和真实标签按照预测结果从大到小的顺序进行排序
sorted_predictions = [p for _, p in sorted(zip(predictions, ground_truth), reverse=True)]
print(sorted_predictions)
# 统计正样本和负样本的数量
positive_count = sum(ground_truth)
negative_count = len(ground_truth) - positive_count

neg_found_count = 0
pos_gt_neg_count = 0
# 计算正样本大于负样本的数量之和
for label in sorted_predictions:
if label == 1:
pos_gt_neg_count += negative_count - neg_found_count
else:
neg_found_count += 1

# 计算AUC
auc = 1.0 * pos_gt_neg_count / (positive_count * negative_count)

return auc

# 真实标签
ground_truth = [1, 0, 1, 0, 1, 1]
# 预测结果
predictions = [0.5, 0.3, 0.1, 0.2, 0.8, 0.9]

# 计算AUC
auc = calculate_auc(ground_truth, predictions)

print("AUC:", auc)

SQL实现

  • 详情见:深入理解AUC

  • 推导思路:

    • 统计每个正样本大于负样本的概率(排在该正样本后面的负样本数/总的负样本数)
    • 对所有正样本的概率求均值
  • SQL实现

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    select
    (ry - 0.5*n1*(n1+1))/n0/n1 as auc
    from(
    select
    sum(if(y=0, 1, 0)) as n0,
    sum(if(y=1, 1, 0)) as n1,
    sum(if(y=1, r, 0)) as ry
    from(
    select y, row_number() over(order by score asc) as r
    from(
    select y, score
    from some.table
    )A
    )B
    )C
  • SQL实现(分场景+pcoc实现)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    select 
    scene,
    (ry - 0.5*n1*(n1+1))/n0/n1 as auc,
    n1/(n1+n0) as ctr,
    pctr,
    pctr/(n1/(n1+n0)) as pcoc,
    from(
    select
    scene,
    sum(if(y=0, 1, 0)) as n0,
    sum(if(y=1, 1, 0)) as n1,
    sum(if(y=1, r, 0)) as ry,
    avg(score) as pctr
    from(
    select scene, score, y, row_number() over(partition by scene order by score asc) as r
    from(
    select scene, y, score
    from some.table
    )A
    )B
    )C