Home > Blockchain >  Remove duplicates of list based on condition
Remove duplicates of list based on condition

Time:02-16

Say, I have the following two lists:

list1 = ['A', 'A', 'B', 'B', 'C', 'D']
list2 = ['x', 'y', 'y', 'x', 'x', 'y']

I want to eliminate all duplicates of list1 and their corresponding elements in list2 based on the condition that the corresponding element of the duplicate in list2 is 'y'.

Expected outcome:

list1 = ['A', 'B', 'C', 'D']
list2 = ['y', 'y', 'x', 'y']

The final goal in the end to continue doing stuff based on the returned indices, for the example that would be for the example above:

index = [1, 2, 4, 5]

I tried solving this by using pandas

df = pd.DataFrame(zip(list1, list2), columns=["l1", "l2"])
df = df[(~(df.duplicated(['l1']))) | (df.duplicated(['l1']) & df.l2.eq('y'))]

But this does not give me the correct result. Please note that I cannot refer to first or last element dropping, as 'x' and 'y' do not need to appear in the same order.

A solution with pandas would be fine, but is not necessary, a solution with list comprehension would be also fine...

CodePudding user response:

You could use:

# keep if: l1 is not duplicated     OR  l2 == "y"
df[~df['l1'].duplicated(keep=False) | df['l2'].eq('y')]

output:

  l1 l2
1  A  y
2  B  y
4  C  x
5  D  y
  • Related