Applying a function at a cumulative week level in pandas-CodePudding

Below is my dataframe(df), where i have instances at a weekly level of expected and actuals

I need to apply the below Python code function to calculate the auc at a cumulative weekly level.

from sklearn.metrics import roc_auc_score
   def auc_group(df):
      y_hat = df.expected
      y = df.actual
      return (roc_auc_score(y,y_hat))

For each week, the scoring should happen for all records until that week, so that result should look something like below.

While it is simple to apply the function for each week, applying that to do a cumulative score, considering all the records until that week, has been a challenge for me. Any help in solving this is appreciated.

CodePudding user response：

If need cumulative count and cumulative roc_auc_score use:

from sklearn.metrics import roc_auc_score

s1 = df['week'].value_counts().sort_index().cumsum()

expected, actual = [],[]
def f(x):
    expected.extend(x['expected'].tolist())
    actual.extend(x['actual'].tolist())
    return roc_auc_score(actual, expected)

s2 = df.groupby('week').apply(f)

df = (pd.concat([s1, s2], axis=1, keys=('count of records','AUC'))
        .rename_axis('week')
        .reset_index())
print (df)
   week  count of records       AUC
0    10                 2  1.000000
1    11                 6  0.500000
2    12                10  0.583333

If need cumulative count and roc_auc_score per groups (not cumulative) use:

from sklearn.metrics import roc_auc_score

s1 = df['week'].value_counts().sort_index().cumsum()
s2 = df.groupby('week').apply(lambda x: roc_auc_score(x.actual,x.expected))

df = (pd.concat([s1, s2], axis=1, keys=('count of records','AUC'))
        .rename_axis('week')
        .reset_index())
print (df)
   week  count of records       AUC
0    10                 2  1.000000
1    11                 6  0.166667
2    12                10  0.750000

CodePudding user response：

import pandas as pd
import numpy as np
from sklearn import datasets

iris = datasets.load_iris()

data = pd.concat([pd.DataFrame(iris.target, columns = ['target']), 
pd.DataFrame(iris.data, columns = iris.feature_names)], axis = 1)

def target_replace(x):
    for i in [0,1,2]:
        if x == i:
            return(iris.target_names[i])
    
data.target = data.target.apply(target_replace)