Remove duplicates with a given condition and return unique list of tuples in python-CodePudding

I have a list of tuples as below :

ls = [("red", "apple"), ("black", "grapes"),
      ("green", "apple"), ("yellow", "banana"),
      ("white", "litchi"), ("brown", "grapes")]

If you notice, I have both red and green "apple" as well as black and brown "grapes". So I want to remove any one tuple and retain the another one, the output should look like:

output = [("red", "apple"), ("black", "grapes"),
          ("yellow", "banana"), ("white", "litchi")]

so in the output (green apple) and (brown grapes) is removed.

Is there any way to achieve this? I tried many times but could not figure out. Please help.. :)

CodePudding user response：

If need remove duplicates by second value of tuples use DataFrame.drop_duplicates:

a = pd.DataFrame(ls).drop_duplicates([1]).apply(tuple, 1).tolist()
print (a)
[('red', 'apple'), ('black', 'grapes'), ('yellow', 'banana'), ('white', 'litchi')]

CodePudding user response：

I managed to to that converting the list to a Pandas DataFrame, removing the duplicates based on the "fruit" attribute, and then converting it back to a list of tuples.

import pandas as pd

ls = [("red", "apple"), ("black", "grapes"),
      ("green", "apple"), ("yellow", "banana"),
      ("white", "litchi"), ("brown", "grapes")]

df = pd.DataFrame (ls, columns=["color", "fruit"])
df.drop_duplicates (subset=["fruit"], keep="first", inplace=True)

print (list(df.to_records(index=False)))

CodePudding user response：

pandas is overkill for this. It can be done without importing any extra modules.

Create an intermediate dictionary then reconstruct the list of tuples from that:

ls = [("red", "apple"), ("black", "grapes"),
      ("green", "apple"), ("yellow", "banana"),
      ("white", "litchi"), ("brown", "grapes")]

d = [(v, k) for k, v in {v:k for k, v in ls}.items()]

print(d)

Output:

[('green', 'apple'), ('brown', 'grapes'), ('yellow', 'banana'), ('white', 'litchi')]