Home > Mobile >  tagging the column based on conditions
tagging the column based on conditions

Time:11-16

Existing Dataframe :

Id        Month       Year        processed      success
A          Jan        2021             0           0
A          Feb        2021             0           1
A          Mar        2021             1           0
B          Jan        2021             0           1
B          Feb        2021             0           0
B          Mar        2021             0           0
B          Apr        2021             0           0
C          Dec        2021             0           0
C          Jan        2022             0           0
C          Feb        2022             1           0

Expected Dataframe :

  Id            final_status
  A             Paid
  B             UnPaid
  C             Paid

I am trying to create a Tag as Paid and UnPaid for the Id. Condition for UnPaid is to check for the last three consecutive month if either success or Processed doen't have any count it is to be flagged as "Unpaid" else "Paid"

stuck with applying conditions.

CodePudding user response:

First create helper Series for test if not 1 in both columns by DataFrame.ne and DataFrame.all and then aggregate by GroupBy.agg with numpy.where:

df1 = (df[['processed','success']].ne(1).all(axis=1)
        .groupby(df['Id']).agg(lambda x: np.where(x[-3:].all(), 'UnPaid', 'Paid'))
        .reset_index(name='final_status')
        )
print (df1)
  Id final_status
0  A         Paid
1  B       UnPaid
2  C         Paid

Details:

print (df[['processed','success']].ne(1).all(axis=1))
0     True
1    False
2    False
3    False
4     True
5     True
6     True
7     True
8     True
9    False
dtype: bool

CodePudding user response:

You can get the last 3 rows per group, and check whether there is at least one 1 in the whole slice.

For this use groupby.apply, in which the last 3 rows/values are selected with tail, compared to 1 with eq and finally check if there is at least one occurrence with all.

One option:

N = 3
(df.groupby('Id')
   .apply(lambda d: d.tail(N).eq(1).any(axis=1).any())
   .map({True: 'Paid', False: 'UnPaid'})
   .reset_index(name='final_status')
)

Or:

N = 3
(df[['processed', 'success']]
 .eq(1).any(axis=1)
 .groupby(df['Id']).apply(lambda s: s.tail(N).any())
 .map({True: 'Paid', False: 'UnPaid'})
 .reset_index(name='final_status')
)

Another with :

N = 3
(df.groupby('Id')
   .apply(lambda d: np.where(d.eq(1).to_numpy()[-N:].any(), 'Paid', 'UnPaid'))
   .reset_index(name='final_status')
)

Output:

  Id final_status
0  A         Paid
1  B       UnPaid
2  C         Paid
  • Related