Home > Enterprise >  Propagate / broadcast a value in groups back to the dataframe column
Propagate / broadcast a value in groups back to the dataframe column

Time:10-15

I have a dataframe as following:

Year-Week   Ticket  Policy       User
2022-35     12502   COMPLIED      MX 
2022-35     12502   NOT COMPLIED  US
2022-35     12502   COMPLIED      AL
2022-35     12502   COMPLIED      BR
2022-35     12471   COMPLIED      DE
2022-35     12409   COMPLIED      US
2022-35     12409   NOT COMPLIED  BR
2022-35     12406   COMPLIED      MX 

When there are duplicate tickets, how could I standardize that each ticket has the same policy? E.g: If a ticket in a week has "NOT COMPLIED", the policy should be the same for that ticket no matter the week, so, should be something like this:

Output Sample

Year-Week   Ticket  Policy       User
2022-35     12502   NOT COMPLIED  MX 
2022-35     12502   NOT COMPLIED  US
2022-35     12502   NOT COMPLIED  AL
2022-35     12502   NOT COMPLIED  BR
2022-35     12471   COMPLIED      DE
2022-35     12409   NOT COMPLIED  US
2022-35     12409   NOT COMPLIED  BR
2022-35     12406   COMPLIED      MX 

Basically what I want is the dominance of NOT COMPLIED over COMPLIED when there are duplicated Tickets.

CodePudding user response:

Leverage the fact that "N" comes after "C" on the ASCII table. So max() would take "NOT COMPLIED" over "COMPLIED".

As you want to propagate the same policy for each Ticket, call max() using groupby().transform() so that the result is indexed the same as the original dataframe, allowing you to directly assign the result back to the dataframe.

df['Policy'] = df.groupby('Ticket')['Policy'].transform('max')
  Year-Week  Ticket        Policy User
0   2022-35   12502  NOT COMPLIED   MX
1   2022-35   12502  NOT COMPLIED   US
2   2022-35   12502  NOT COMPLIED   AL
3   2022-35   12502  NOT COMPLIED   BR
4   2022-35   12471      COMPLIED   DE
5   2022-35   12409  NOT COMPLIED   US
6   2022-35   12409  NOT COMPLIED   BR
7   2022-35   12406      COMPLIED   MX

CodePudding user response:

Try this piece of code;

df1 = df[['Year-Week', 'Ticket']].drop_duplicates()

policy = []
for time,ticket in zip(df1['Year-Week'],df1['Ticket']):
    df_temp = df[(df["Year-Week"] == time) & (df["Ticket"] == ticket)]
    if "NOT COMPLIED" in df_temp["Policy"].to_list():
        policy.extend(["NOT COMPLIED"] * df_temp.shape[0])
    else:
        policy.extend(["COMPLIED"] * df_temp.shape[0])
        
df["Policy"] = policy

Hope this helps...

CodePudding user response:

You can do a groupby followed by a transfrom and set it to the Policy column;

df['Policy'] = df.groupby(['Year-Week','Ticket']).transform(lambda x: 'NOT COMPLIED'if 'NOT COMPLIED' in x else 'COMPLIED')
  • Related