Dataframe which I am using is as below:
Name NoOfTrans Avg_pass_time Cons.Error RunCounts
Jan 0 Failed:abcd 4
Jan 4
Jan 4
Jan 4
May 2 Failed:abcFailed:cde 5
May 5
May 1200 5
May 1200 5
May 5
I need to remove the duplicate from "Name", "Avg_pass_time" and "RunCounts" columns group by the "Name" column so that the output is as below:
Name NoOfTrans Avg_pass_time Cons.Error RunCounts
Jan 0 Failed:abcd 4
May 2 1200 Failed:abcFailed:cde 5
Any guide will be usefull
CodePudding user response:
If per groups are only empty strings or duplicated values use:
df = df.replace('',np.nan).groupby('Name', as_index=False).first().fillna('')
CodePudding user response:
You can select a subset of rows that will be used to drop the duplicates:
df = df.drop_duplicates(subset=['Name','Avg_pass_time','RunCounts'])
Untested but this should work.