I have the below dataframe called df:
Id | Stage1 | Stage2 | Stage3 |
---|---|---|---|
1 | 2022-02-01 | 2020-04-03 | 2022-06-07 |
--- | ------------ | ------------ | ----------- |
2 | 2023-06-07 | 2020-03-01 | 2020-09-03 |
--- | ------------ | ------------ | ----------- |
3 | 2023-02-04 | 2023-06-07 | 2022-06-07 |
--- | ------------ | ------------ | ----------- |
4 | 2022-05-08 | 2023-09-01 | 2023-09-01 |
I need to calculate the max date for each ID and its respective Stage. So for Order 1,2,3,4 the Stages I need are Stage 3, Stage 1, Stage 2, and Stage 3 respectively. I understand that using
df.filter(like="stage").idxmax(axis=1)
Finds the first occurrence of max date in a row and gives me its column name. However, for Order 4, Stage 2 and 3 have the same date. I need Stage 3 as my answer as Stage 3 is the latest stage of the order. How is this possible?
CodePudding user response:
Swap order of columns for match latest maximal value:
s = df.filter(like="Stage").iloc[:, ::-1].idxmax(axis=1)
print (s)
0 Stage3
1 Stage1
2 Stage2
3 Stage3
dtype: object
Details:
print (df.filter(like="Stage").iloc[:, ::-1])
Stage3 Stage2 Stage1
0 2022-06-07 2020-04-03 2022-02-01
1 2020-09-03 2020-03-01 2023-06-07
2 2022-06-07 2023-06-07 2023-02-04
3 2023-09-01 2023-09-01 2022-05-08