I need to assign correct value (Qualified or Not Qualified) to all rows associated with Client per Month if condition for all of the associated rows is met.
test_data = {'Client Id': [1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland',
'Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland',
'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt',
'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt',],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022',
'01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/31/2022 - 02/06/2022', '02/07/2022 - 02/13/2022',
'02/14/2022 - 02/20/2022','02/21/2022 - 02/27/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022',
'01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/31/2022 - 02/06/2022', '02/07/2022 - 02/13/2022',
'02/14/2022 - 02/20/2022','02/21/2022 - 02/27/2022'],
'Month': ['January', 'January', 'January', 'January',
"February", "February", "February", "February",
'January', 'January', 'January', 'January',
"February", "February", "February", "February"],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,
2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Payment Status': ["Pending", "Paid in Full", "Didn't Paid", "Paid in Full",
"Paid in Full", "Paid in Full", "Paid in Full",
"Paid in Full", "Paid in Full", "Paid in Full",
"Paid in Full", "Paid in Full", "Paid in Full",
"Paid in Full", "Paid in Full", "Pending"]}
test_df = pd.DataFrame(data=test_data)
Data:
Client Id Client Name Week Month Year Payment Status
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didn't Paid
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full
1 Tom Holland 01/31/2022 - 02/06/2022 February 2022 Paid in Full
1 Tom Holland 02/07/2022 - 02/13/2022 February 2022 Paid in Full
1 Tom Holland 02/14/2022 - 02/20/2022 February 2022 Paid in Full
1 Tom Holland 02/21/2022 - 02/27/2022 February 2022 Paid in Full
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full
2 Brad Pitt 01/31/2022 - 02/06/2022 February 2022 Paid in Full
2 Brad Pitt 02/07/2022 - 02/13/2022 February 2022 Paid in Full
2 Brad Pitt 02/14/2022 - 02/20/2022 February 2022 Paid in Full
2 Brad Pitt 02/21/2022 - 02/27/2022 February 2022 Pending
If every row (Week) associated with Client per each Month is Paid in Full, then Qualified assigned to all rows (Weeks) associated with Client per each Month. Even if 1 Week is not Paid in Full (3 weeks can be Paid in Full, but 1 Didn't Paid or Pending), then all rows assigned to Not Qualified.
Desired output:
Client Id Client Name Week Month Year Payment Status Qualification
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending Not Qualified
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didn't Paid Not Qualified
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/31/2022 - 02/06/2022 February 2022 Paid in Full Qualified
1 Tom Holland 02/07/2022 - 02/13/2022 February 2022 Paid in Full Qualified
1 Tom Holland 02/14/2022 - 02/20/2022 February 2022 Paid in Full Qualified
1 Tom Holland 02/21/2022 - 02/27/2022 February 2022 Paid in Full Qualified
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/31/2022 - 02/06/2022 February 2022 Paid in Full Not Qualified
2 Brad Pitt 02/07/2022 - 02/13/2022 February 2022 Paid in Full Not Qualified
2 Brad Pitt 02/14/2022 - 02/20/2022 February 2022 Paid in Full Not Qualified
2 Brad Pitt 02/21/2022 - 02/27/2022 February 2022 Pending Not Qualified
I don't know how to achieve this, I though about value_counts in the loop:
for name, month in zip(list(test_df["Client Name"].unique()), list(test_df["Month"])):
print(test_df[(test_df["Client Name"] == name) & (test_df["Month"] == month)].value_counts(["Payment Status"]))
CodePudding user response:
The key is to create a boolean mask: if Payment Status
is "Paid in full" then True
else False
. Now you can group by Client Id
, Month
AND Year
to check if all values are True
. Use transform
to broadcast the result to every row of the group. Finally, replace True/False
by its respective values.
The boolean mask is created dynamically by adding a new column is_paid
to the dataframe:
df['Qualification'] = (
df.assign(is_paid=df['Payment Status'] == 'Paid in Full')
.groupby(['Client Id', 'Month', 'Year'])['is_paid']
.transform('all').replace({True: 'Qualified', False: 'Not Qualified'})
)
print(df)
# Output
Client Id Client Name Week Month Year Payment Status Qualification
0 1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending Not Qualified
1 1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full Not Qualified
2 1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didn't Paid Not Qualified
3 1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full Not Qualified
4 1 Tom Holland 01/31/2022 - 02/06/2022 February 2022 Paid in Full Qualified
5 1 Tom Holland 02/07/2022 - 02/13/2022 February 2022 Paid in Full Qualified
6 1 Tom Holland 02/14/2022 - 02/20/2022 February 2022 Paid in Full Qualified
7 1 Tom Holland 02/21/2022 - 02/27/2022 February 2022 Paid in Full Qualified
8 2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full Qualified
9 2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full Qualified
10 2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full Qualified
11 2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full Qualified
12 2 Brad Pitt 01/31/2022 - 02/06/2022 February 2022 Paid in Full Not Qualified
13 2 Brad Pitt 02/07/2022 - 02/13/2022 February 2022 Paid in Full Not Qualified
14 2 Brad Pitt 02/14/2022 - 02/20/2022 February 2022 Paid in Full Not Qualified
15 2 Brad Pitt 02/21/2022 - 02/27/2022 February 2022 Pending Not Qualified
CodePudding user response:
First convert Payment Status
to bools:
test_df['Paid'] = test_df['Payment Status'] == 'Paid in Full'
>>> test_df
Client Id Client Name Week Month Year Payment Status Paid
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending FALSE
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full TRUE
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didnt Paid FALSE
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full TRUE
1 Tom Holland 01/31/2022 - 02/06/2022 February2022 Paid in Full TRUE
1 Tom Holland 02/07/2022 - 02/13/2022 February2022 Paid in Full TRUE
1 Tom Holland 02/14/2022 - 02/20/2022 February2022 Paid in Full TRUE
1 Tom Holland 02/21/2022 - 02/27/2022 February2022 Paid in Full TRUE
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full TRUE
2 Brad Pitt 01/31/2022 - 02/06/2022 February2022 Paid in Full TRUE
2 Brad Pitt 02/07/2022 - 02/13/2022 February2022 Paid in Full TRUE
2 Brad Pitt 02/14/2022 - 02/20/2022 February2022 Paid in Full TRUE
2 Brad Pitt 02/21/2022 - 02/27/2022 February2022 Pending FALSE
The group them by Month
and Client Id
and check all Paid
values in a group is True
:
status = test_df[["Client Id", "Month", "Paid"]].groupby(["Month", "Client Id"]).all()
>>> status
Paid
Month Client Id
February 1 True
2 False
January 1 False
2 True
Now reset index and convert Paid
back to text (Qualified
or Not Qualified
):
status = status.reset_index()
status['Paid'] = status['Paid'].map({True: 'Qualified', False:"Not Qualified"})
>>> status
Month Client Id Paid
February 1 Qualified
February 2 Not Qualified
January 1 Not Qualified
January 2 Qualified
Now merge with original table to get the desired results (and drop unnecessary columns made by merging. Also rename the new column:
output = pd.merge(test_df, a, on=['Client Id', 'Month']).drop(columns='Paid_x')
output = output.rename(columns={'Paid_y': 'Qualification'})
>>> output
Client Id Client Name Week Month Year Payment Status Qualification
1 Tom Holland 01/03/2022 - 01/09/2022 January 2022 Pending Not Qualified
1 Tom Holland 01/10/2022 - 01/16/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/17/2022 - 01/23/2022 January 2022 Didnt Paid Not Qualified
1 Tom Holland 01/24/2022 - 01/30/2022 January 2022 Paid in Full Not Qualified
1 Tom Holland 01/31/2022 - 02/06/2022 February2022 Paid in Full Qualified
1 Tom Holland 02/07/2022 - 02/13/2022 February2022 Paid in Full Qualified
1 Tom Holland 02/14/2022 - 02/20/2022 February2022 Paid in Full Qualified
1 Tom Holland 02/21/2022 - 02/27/2022 February2022 Paid in Full Qualified
2 Brad Pitt 01/03/2022 - 01/09/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/10/2022 - 01/16/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/17/2022 - 01/23/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/24/2022 - 01/30/2022 January 2022 Paid in Full Qualified
2 Brad Pitt 01/31/2022 - 02/06/2022 February2022 Paid in Full Not Qualified
2 Brad Pitt 02/07/2022 - 02/13/2022 February2022 Paid in Full Not Qualified
2 Brad Pitt 02/14/2022 - 02/20/2022 February2022 Paid in Full Not Qualified
2 Brad Pitt 02/21/2022 - 02/27/2022 February2022 Pending Not Qualified