Home > database >  Assign correct Qualification to all rows associated with Client per Month - Python / Pandas
Assign correct Qualification to all rows associated with Client per Month - Python / Pandas

Time:04-02

I need to assign correct value (Qualified or Not Qualified) to all rows associated with Client per Month if condition for all of the associated rows is met.

test_data = {'Client Id': [1,1,1,1,1,1,1,1, 
                           2,2,2,2,2,2,2,2],
            'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 
                           'Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland',
                           'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt',       
                           'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt',],
             'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', 
                     '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
                     '01/31/2022 - 02/06/2022', '02/07/2022 - 02/13/2022',     
                     '02/14/2022 - 02/20/2022','02/21/2022 - 02/27/2022',
                     '01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', 
                     '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
                     '01/31/2022 - 02/06/2022', '02/07/2022 - 02/13/2022', 
                     '02/14/2022 - 02/20/2022','02/21/2022 - 02/27/2022'], 
            'Month': ['January', 'January', 'January', 'January',              
                      "February", "February", "February", "February",
                      'January', 'January', 'January', 'January', 
                      "February", "February", "February", "February"], 
            'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022,
                     2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
            'Payment Status': ["Pending", "Paid in Full", "Didn't Paid", "Paid in Full", 
                              "Paid in Full", "Paid in Full", "Paid in Full", 
                              "Paid in Full", "Paid in Full", "Paid in Full", 
                              "Paid in Full", "Paid in Full", "Paid in Full", 
                              "Paid in Full", "Paid in Full", "Pending"]}

test_df = pd.DataFrame(data=test_data)

Data:

Client Id      Client Name           Week              Month    Year    Payment Status

    1         Tom Holland   01/03/2022 - 01/09/2022   January   2022       Pending
    1         Tom Holland   01/10/2022 - 01/16/2022   January   2022    Paid in Full
    1         Tom Holland   01/17/2022 - 01/23/2022   January   2022     Didn't Paid
    1         Tom Holland   01/24/2022 - 01/30/2022   January   2022    Paid in Full
    1         Tom Holland   01/31/2022 - 02/06/2022   February  2022    Paid in Full
    1         Tom Holland   02/07/2022 - 02/13/2022   February  2022    Paid in Full
    1         Tom Holland   02/14/2022 - 02/20/2022   February  2022    Paid in Full
    1         Tom Holland   02/21/2022 - 02/27/2022   February  2022    Paid in Full
    2          Brad Pitt    01/03/2022 - 01/09/2022   January   2022    Paid in Full
    2          Brad Pitt    01/10/2022 - 01/16/2022   January   2022    Paid in Full
    2          Brad Pitt    01/17/2022 - 01/23/2022   January   2022    Paid in Full
    2          Brad Pitt    01/24/2022 - 01/30/2022   January   2022    Paid in Full
    2          Brad Pitt    01/31/2022 - 02/06/2022   February  2022    Paid in Full
    2          Brad Pitt    02/07/2022 - 02/13/2022   February  2022    Paid in Full
    2          Brad Pitt    02/14/2022 - 02/20/2022   February  2022    Paid in Full
    2          Brad Pitt    02/21/2022 - 02/27/2022   February  2022      Pending

If every row (Week) associated with Client per each Month is Paid in Full, then Qualified assigned to all rows (Weeks) associated with Client per each Month. Even if 1 Week is not Paid in Full (3 weeks can be Paid in Full, but 1 Didn't Paid or Pending), then all rows assigned to Not Qualified.

Desired output:

Client Id  Client Name           Week              Month    Year    Payment Status  Qualification

    1      Tom Holland  01/03/2022 - 01/09/2022   January   2022       Pending     Not Qualified
    1      Tom Holland  01/10/2022 - 01/16/2022   January   2022    Paid in Full   Not Qualified
    1      Tom Holland  01/17/2022 - 01/23/2022   January   2022     Didn't Paid   Not Qualified
    1      Tom Holland  01/24/2022 - 01/30/2022   January   2022    Paid in Full   Not Qualified
    1      Tom Holland  01/31/2022 - 02/06/2022   February  2022    Paid in Full    Qualified
    1      Tom Holland  02/07/2022 - 02/13/2022   February  2022    Paid in Full    Qualified
    1      Tom Holland  02/14/2022 - 02/20/2022   February  2022    Paid in Full    Qualified
    1      Tom Holland  02/21/2022 - 02/27/2022   February  2022    Paid in Full    Qualified
    2       Brad Pitt   01/03/2022 - 01/09/2022   January   2022    Paid in Full    Qualified
    2       Brad Pitt   01/10/2022 - 01/16/2022   January   2022    Paid in Full    Qualified
    2       Brad Pitt   01/17/2022 - 01/23/2022   January   2022    Paid in Full    Qualified
    2       Brad Pitt   01/24/2022 - 01/30/2022   January   2022    Paid in Full    Qualified
    2       Brad Pitt   01/31/2022 - 02/06/2022   February  2022    Paid in Full   Not Qualified
    2       Brad Pitt   02/07/2022 - 02/13/2022   February  2022    Paid in Full   Not Qualified
    2       Brad Pitt   02/14/2022 - 02/20/2022   February  2022    Paid in Full   Not Qualified
    2       Brad Pitt   02/21/2022 - 02/27/2022   February  2022      Pending      Not Qualified

I don't know how to achieve this, I though about value_counts in the loop:

for name, month in zip(list(test_df["Client Name"].unique()), list(test_df["Month"])):
    print(test_df[(test_df["Client Name"] == name) & (test_df["Month"] == month)].value_counts(["Payment Status"]))

CodePudding user response:

The key is to create a boolean mask: if Payment Status is "Paid in full" then True else False. Now you can group by Client Id, Month AND Year to check if all values are True. Use transform to broadcast the result to every row of the group. Finally, replace True/False by its respective values.

The boolean mask is created dynamically by adding a new column is_paid to the dataframe:

df['Qualification'] = (
    df.assign(is_paid=df['Payment Status'] == 'Paid in Full')
      .groupby(['Client Id', 'Month', 'Year'])['is_paid']
      .transform('all').replace({True: 'Qualified', False: 'Not Qualified'})
)
print(df)

# Output
    Client Id  Client Name                     Week     Month  Year Payment Status  Qualification
0           1  Tom Holland  01/03/2022 - 01/09/2022   January  2022        Pending  Not Qualified
1           1  Tom Holland  01/10/2022 - 01/16/2022   January  2022   Paid in Full  Not Qualified
2           1  Tom Holland  01/17/2022 - 01/23/2022   January  2022    Didn't Paid  Not Qualified
3           1  Tom Holland  01/24/2022 - 01/30/2022   January  2022   Paid in Full  Not Qualified
4           1  Tom Holland  01/31/2022 - 02/06/2022  February  2022   Paid in Full      Qualified
5           1  Tom Holland  02/07/2022 - 02/13/2022  February  2022   Paid in Full      Qualified
6           1  Tom Holland  02/14/2022 - 02/20/2022  February  2022   Paid in Full      Qualified
7           1  Tom Holland  02/21/2022 - 02/27/2022  February  2022   Paid in Full      Qualified
8           2    Brad Pitt  01/03/2022 - 01/09/2022   January  2022   Paid in Full      Qualified
9           2    Brad Pitt  01/10/2022 - 01/16/2022   January  2022   Paid in Full      Qualified
10          2    Brad Pitt  01/17/2022 - 01/23/2022   January  2022   Paid in Full      Qualified
11          2    Brad Pitt  01/24/2022 - 01/30/2022   January  2022   Paid in Full      Qualified
12          2    Brad Pitt  01/31/2022 - 02/06/2022  February  2022   Paid in Full  Not Qualified
13          2    Brad Pitt  02/07/2022 - 02/13/2022  February  2022   Paid in Full  Not Qualified
14          2    Brad Pitt  02/14/2022 - 02/20/2022  February  2022   Paid in Full  Not Qualified
15          2    Brad Pitt  02/21/2022 - 02/27/2022  February  2022        Pending  Not Qualified

CodePudding user response:

First convert Payment Status to bools:

test_df['Paid'] = test_df['Payment Status'] == 'Paid in Full'
>>> test_df
Client Id   Client Name Week    Month   Year    Payment         Status          Paid
1           Tom Holland 01/03/2022 - 01/09/2022 January 2022    Pending         FALSE
1           Tom Holland 01/10/2022 - 01/16/2022 January 2022    Paid in Full    TRUE
1           Tom Holland 01/17/2022 - 01/23/2022 January 2022    Didnt Paid      FALSE
1           Tom Holland 01/24/2022 - 01/30/2022 January 2022    Paid in Full    TRUE
1           Tom Holland 01/31/2022 - 02/06/2022 February2022    Paid in Full    TRUE
1           Tom Holland 02/07/2022 - 02/13/2022 February2022    Paid in Full    TRUE
1           Tom Holland 02/14/2022 - 02/20/2022 February2022    Paid in Full    TRUE
1           Tom Holland 02/21/2022 - 02/27/2022 February2022    Paid in Full    TRUE
2           Brad Pitt   01/03/2022 - 01/09/2022 January 2022    Paid in Full    TRUE
2           Brad Pitt   01/10/2022 - 01/16/2022 January 2022    Paid in Full    TRUE
2           Brad Pitt   01/17/2022 - 01/23/2022 January 2022    Paid in Full    TRUE
2           Brad Pitt   01/24/2022 - 01/30/2022 January 2022    Paid in Full    TRUE
2           Brad Pitt   01/31/2022 - 02/06/2022 February2022    Paid in Full    TRUE
2           Brad Pitt   02/07/2022 - 02/13/2022 February2022    Paid in Full    TRUE
2           Brad Pitt   02/14/2022 - 02/20/2022 February2022    Paid in Full    TRUE
2           Brad Pitt   02/21/2022 - 02/27/2022 February2022    Pending         FALSE

The group them by Month and Client Id and check all Paid values in a group is True:

status = test_df[["Client Id", "Month", "Paid"]].groupby(["Month", "Client Id"]).all()
>>> status
                      Paid
Month     Client Id 
February  1           True
          2           False
January   1           False
          2           True

Now reset index and convert Paid back to text (Qualified or Not Qualified):

status = status.reset_index()
status['Paid'] = status['Paid'].map({True: 'Qualified', False:"Not Qualified"})
>>> status

Month   Client Id   Paid
February    1       Qualified
February    2       Not Qualified
January     1       Not Qualified
January     2       Qualified

Now merge with original table to get the desired results (and drop unnecessary columns made by merging. Also rename the new column:

output = pd.merge(test_df, a, on=['Client Id', 'Month']).drop(columns='Paid_x')
output = output.rename(columns={'Paid_y': 'Qualification'})
>>> output
Client Id   Client Name Week                    Month   Year    Payment Status  Qualification
1           Tom Holland 01/03/2022 - 01/09/2022 January 2022    Pending         Not Qualified
1           Tom Holland 01/10/2022 - 01/16/2022 January 2022    Paid in Full    Not Qualified
1           Tom Holland 01/17/2022 - 01/23/2022 January 2022    Didnt Paid      Not Qualified
1           Tom Holland 01/24/2022 - 01/30/2022 January 2022    Paid in Full    Not Qualified
1           Tom Holland 01/31/2022 - 02/06/2022 February2022    Paid in Full    Qualified
1           Tom Holland 02/07/2022 - 02/13/2022 February2022    Paid in Full    Qualified
1           Tom Holland 02/14/2022 - 02/20/2022 February2022    Paid in Full    Qualified
1           Tom Holland 02/21/2022 - 02/27/2022 February2022    Paid in Full    Qualified
2           Brad Pitt   01/03/2022 - 01/09/2022 January 2022    Paid in Full    Qualified
2           Brad Pitt   01/10/2022 - 01/16/2022 January 2022    Paid in Full    Qualified
2           Brad Pitt   01/17/2022 - 01/23/2022 January 2022    Paid in Full    Qualified
2           Brad Pitt   01/24/2022 - 01/30/2022 January 2022    Paid in Full    Qualified
2           Brad Pitt   01/31/2022 - 02/06/2022 February2022    Paid in Full    Not Qualified
2           Brad Pitt   02/07/2022 - 02/13/2022 February2022    Paid in Full    Not Qualified
2           Brad Pitt   02/14/2022 - 02/20/2022 February2022    Paid in Full    Not Qualified
2           Brad Pitt   02/21/2022 - 02/27/2022 February2022    Pending Not     Qualified
  • Related