Home > Mobile >  How to rank over multiple rows and columns (dataframe)
How to rank over multiple rows and columns (dataframe)

Time:08-30

Small sample of data:

df = pd.DataFrame({'Customer_ID': [1,1,1,1, 2,2,2,2], 'Customer Age' : [15,15,15,15, 50,50,50,50],   'Similar Customer ID': [5, 8, 11, 15, 22, 28, 31, 11], 'Age': [40, 43, 37, 11, 29, 42, 14, 33],
})

Question

How do I add a column that rank the customer with related customers, i.e., Customer 1 is 15 years old, I want the desired output to rank that customer as second youngest compared to 'Similar Customer':

df = pd.DataFrame({'Customer_ID': [1,1,1,1, 2,2,2,2], 'Customer Age' : [15,15,15,15, 50,50,50,50],   'Similar Customer ID': [5, 8, 11, 15, 22, 28, 31, 11], 'Age': [40, 43, 37, 11, 29, 42, 14, 33], 'Age Rank': [2, 2, 2, 2, 5,5,5,5]
})

CodePudding user response:

You can just compare the Customer Age to Age then take the sum (over each customer):

df['Age Rank'] = (df['Customer Age'].ge(df['Age'])
                    .groupby(df['Customer_ID'])
                    .transform('sum')   1
                 )
print(df)

Output:

   Customer_ID  Customer Age  Similar Customer ID  Age  Age Rank
0            1            15                    5   40         2
1            1            15                    8   43         2
2            1            15                   11   37         2
3            1            15                   15   11         2
4            2            50                   22   29         5
5            2            50                   28   42         5
6            2            50                   31   14         5
7            2            50                   11   33         5
  • Related