I have a dataframe as following:
Year-Week Ticket Policy User 2022-35 12502 COMPLIED MX 2022-35 12502 NOT COMPLIED US 2022-35 12502 COMPLIED AL 2022-35 12502 COMPLIED BR 2022-35 12471 COMPLIED DE 2022-35 12409 COMPLIED US 2022-35 12409 NOT COMPLIED BR 2022-35 12406 COMPLIED MX
When there are duplicate tickets, how could I standardize that each ticket has the same policy? E.g: If a ticket in a week has "NOT COMPLIED", the policy should be the same for that ticket no matter the week, so, should be something like this:
Output Sample
Year-Week Ticket Policy User 2022-35 12502 NOT COMPLIED MX 2022-35 12502 NOT COMPLIED US 2022-35 12502 NOT COMPLIED AL 2022-35 12502 NOT COMPLIED BR 2022-35 12471 COMPLIED DE 2022-35 12409 NOT COMPLIED US 2022-35 12409 NOT COMPLIED BR 2022-35 12406 COMPLIED MX
Basically what I want is the dominance of NOT COMPLIED over COMPLIED when there are duplicated Tickets.
CodePudding user response:
Leverage the fact that "N"
comes after "C"
on the ASCII table. So max()
would take "NOT COMPLIED" over "COMPLIED".
As you want to propagate the same policy for each Ticket, call max()
using groupby().transform()
so that the result is indexed the same as the original dataframe, allowing you to directly assign the result back to the dataframe.
df['Policy'] = df.groupby('Ticket')['Policy'].transform('max')
Year-Week Ticket Policy User
0 2022-35 12502 NOT COMPLIED MX
1 2022-35 12502 NOT COMPLIED US
2 2022-35 12502 NOT COMPLIED AL
3 2022-35 12502 NOT COMPLIED BR
4 2022-35 12471 COMPLIED DE
5 2022-35 12409 NOT COMPLIED US
6 2022-35 12409 NOT COMPLIED BR
7 2022-35 12406 COMPLIED MX
CodePudding user response:
Try this piece of code;
df1 = df[['Year-Week', 'Ticket']].drop_duplicates()
policy = []
for time,ticket in zip(df1['Year-Week'],df1['Ticket']):
df_temp = df[(df["Year-Week"] == time) & (df["Ticket"] == ticket)]
if "NOT COMPLIED" in df_temp["Policy"].to_list():
policy.extend(["NOT COMPLIED"] * df_temp.shape[0])
else:
policy.extend(["COMPLIED"] * df_temp.shape[0])
df["Policy"] = policy
Hope this helps...
CodePudding user response:
You can do a groupby followed by a transfrom and set it to the Policy column;
df['Policy'] = df.groupby(['Year-Week','Ticket']).transform(lambda x: 'NOT COMPLIED'if 'NOT COMPLIED' in x else 'COMPLIED')