How to remove duplicates from python without values-CodePudding

I have a duplicate list.

I want to remove duplicate lines where there is no value 'sh'.

The example below works on a small list, but if the list is large, then this path does not delete correctly.

Is there any other way to perform the uninstall?

import pandas as pd
union_list = [
 ['10','robot_1','sh']
,['10','robot_1',' ']
,['11','robot_2','sh']
,['11','robot_2','']
,['12','robot_3','']
]

el = list(union_list)
df = pd.DataFrame(el)
df1 = df.drop_duplicates(0)
print(df1)

the result i want to get

    0        1   2
0  10  robot_1  sh
2  11  robot_2  sh
4  12  robot_3

CodePudding user response：

If you have empty strings and 'sh' and want to keep the sh in case of duplicates, you can sort by all columns, which will move empty strings to the top, then drop_duplicates keeping the last value:

df.sort_values(by=[0,1,2]).drop_duplicates(0, keep='last')

alternatively, to always prioritize "sh":

df.sort_values(by=2, key=lambda x: x=='sh').drop_duplicates(0, keep='last')

output:

    0        1   2
0  10  robot_1  sh
2  11  robot_2  sh
4  12  robot_3