I have a dataframe as following:
Year-Week Ticket Policy User 2022-35 12502 COMPLIED MX 2022-35 12502 NOT COMPLIED US 2022-35 12502 COMPLIED AL 2022-35 12502 COMPLIED BR 2022-35 12471 COMPLIED DE 2022-35 12409 COMPLIED US 2022-35 12409 NOT COMPLIED BR 2022-35 12406 COMPLIED MX
There are cases that are duplicated and that is okay since they are from different Users, however, how could I standardize that the same ticket has the same policy? E.g: If a ticket in a week has "NOT COMPLIED", the policy should be the same for that ticket no matter the week, so, should be something like this:
Output Sample
Year-Week Ticket Policy User 2022-35 12502 NOT COMPLIED MX 2022-35 12502 NOT COMPLIED US 2022-35 12502 NOT COMPLIED AL 2022-35 12502 NOT COMPLIED BR 2022-35 12471 COMPLIED DE 2022-35 12409 NOT COMPLIED US 2022-35 12409 NOT COMPLIED BR 2022-35 12406 COMPLIED MX
Basically what I want is the dominance of NOT COMPLIED over COMPLIED when there are duplicated Tickets.
CodePudding user response:
Leverage the fact that "N"
comes after "C"
on the ASCII table. So max()
would take "NOT COMPLIED" over "COMPLIED". As you want to standardize it over Tickets, call max()
as groupby().transform()
.
df['Policy'] = df.groupby('Ticket')['Policy'].transform('max')
Year-Week Ticket Policy User
0 2022-35 12502 NOT COMPLIED MX
1 2022-35 12502 NOT COMPLIED US
2 2022-35 12502 NOT COMPLIED AL
3 2022-35 12502 NOT COMPLIED BR
4 2022-35 12471 COMPLIED DE
5 2022-35 12409 NOT COMPLIED US
6 2022-35 12409 NOT COMPLIED BR
7 2022-35 12406 COMPLIED MX
CodePudding user response:
Try this piece of code;
df1 = df[['Year-Week', 'Ticket']].drop_duplicates()
policy = []
for time,ticket in zip(df1['Year-Week'],df1['Ticket']):
df_temp = df[(df["Year-Week"] == time) & (df["Ticket"] == ticket)]
if "NOT COMPLIED" in df_temp["Policy"].to_list():
policy.extend(["NOT COMPLIED"] * df_temp.shape[0])
else:
policy.extend(["COMPLIED"] * df_temp.shape[0])
df["Policy"] = policy
Hope this helps...
CodePudding user response:
You can do a groupby followed by a transfrom and set it to the Policy column;
df['Policy'] = df.groupby(['Year-Week','Ticket']).transform(lambda x: 'NOT COMPLIED'if 'NOT COMPLIED' in x else 'COMPLIED')