Existing Dataframe :
Id Month Year processed success
A Jan 2021 0 0
A Feb 2021 0 1
A Mar 2021 1 0
B Jan 2021 0 1
B Feb 2021 0 0
B Mar 2021 0 0
B Apr 2021 0 0
C Dec 2021 0 0
C Jan 2022 0 0
C Feb 2022 1 0
Expected Dataframe :
Id final_status
A Paid
B UnPaid
C Paid
I am trying to create a Tag as Paid and UnPaid for the Id. Condition for UnPaid is to check for the last three consecutive month if either success or Processed doen't have any count it is to be flagged as "Unpaid" else "Paid"
stuck with applying conditions.
CodePudding user response:
First create helper Series
for test if not 1
in both columns by DataFrame.ne
and DataFrame.all
and then aggregate by GroupBy.agg
with numpy.where
:
df1 = (df[['processed','success']].ne(1).all(axis=1)
.groupby(df['Id']).agg(lambda x: np.where(x[-3:].all(), 'UnPaid', 'Paid'))
.reset_index(name='final_status')
)
print (df1)
Id final_status
0 A Paid
1 B UnPaid
2 C Paid
Details:
print (df[['processed','success']].ne(1).all(axis=1))
0 True
1 False
2 False
3 False
4 True
5 True
6 True
7 True
8 True
9 False
dtype: bool
CodePudding user response:
You can get the last 3 rows per group, and check whether there is at least one 1 in the whole slice.
For this use groupby.apply
, in which the last 3 rows/values are selected with tail
, compared to 1 with eq
and finally check if there is at least one occurrence with all
.
One option:
N = 3
(df.groupby('Id')
.apply(lambda d: d.tail(N).eq(1).any(axis=1).any())
.map({True: 'Paid', False: 'UnPaid'})
.reset_index(name='final_status')
)
Or:
N = 3
(df[['processed', 'success']]
.eq(1).any(axis=1)
.groupby(df['Id']).apply(lambda s: s.tail(N).any())
.map({True: 'Paid', False: 'UnPaid'})
.reset_index(name='final_status')
)
Another with numpy:
N = 3
(df.groupby('Id')
.apply(lambda d: np.where(d.eq(1).to_numpy()[-N:].any(), 'Paid', 'UnPaid'))
.reset_index(name='final_status')
)
Output:
Id final_status
0 A Paid
1 B UnPaid
2 C Paid