Say, I have the following two lists:
list1 = ['A', 'A', 'B', 'B', 'C', 'D']
list2 = ['x', 'y', 'y', 'x', 'x', 'y']
I want to eliminate all duplicates of list1 and their corresponding elements in list2 based on the condition that the corresponding element of the duplicate in list2 is 'y'.
Expected outcome:
list1 = ['A', 'B', 'C', 'D']
list2 = ['y', 'y', 'x', 'y']
The final goal in the end to continue doing stuff based on the returned indices, for the example that would be for the example above:
index = [1, 2, 4, 5]
I tried solving this by using pandas
df = pd.DataFrame(zip(list1, list2), columns=["l1", "l2"])
df = df[(~(df.duplicated(['l1']))) | (df.duplicated(['l1']) & df.l2.eq('y'))]
But this does not give me the correct result. Please note that I cannot refer to first or last element dropping, as 'x' and 'y' do not need to appear in the same order.
A solution with pandas would be fine, but is not necessary, a solution with list comprehension would be also fine...
CodePudding user response:
You could use:
# keep if: l1 is not duplicated OR l2 == "y"
df[~df['l1'].duplicated(keep=False) | df['l2'].eq('y')]
output:
l1 l2
1 A y
2 B y
4 C x
5 D y