I have a list of tuples as below :
ls = [("red", "apple"), ("black", "grapes"),
("green", "apple"), ("yellow", "banana"),
("white", "litchi"), ("brown", "grapes")]
If you notice, I have both red and green "apple" as well as black and brown "grapes". So I want to remove any one tuple and retain the another one, the output should look like:
output = [("red", "apple"), ("black", "grapes"),
("yellow", "banana"), ("white", "litchi")]
so in the output (green apple) and (brown grapes) is removed.
Is there any way to achieve this? I tried many times but could not figure out. Please help.. :)
CodePudding user response:
If need remove duplicates by second value of tuples use DataFrame.drop_duplicates
:
a = pd.DataFrame(ls).drop_duplicates([1]).apply(tuple, 1).tolist()
print (a)
[('red', 'apple'), ('black', 'grapes'), ('yellow', 'banana'), ('white', 'litchi')]
CodePudding user response:
I managed to to that converting the list to a Pandas DataFrame, removing the duplicates based on the "fruit" attribute, and then converting it back to a list of tuples.
import pandas as pd
ls = [("red", "apple"), ("black", "grapes"),
("green", "apple"), ("yellow", "banana"),
("white", "litchi"), ("brown", "grapes")]
df = pd.DataFrame (ls, columns=["color", "fruit"])
df.drop_duplicates (subset=["fruit"], keep="first", inplace=True)
print (list(df.to_records(index=False)))
CodePudding user response:
pandas is overkill for this. It can be done without importing any extra modules.
Create an intermediate dictionary then reconstruct the list of tuples from that:
ls = [("red", "apple"), ("black", "grapes"),
("green", "apple"), ("yellow", "banana"),
("white", "litchi"), ("brown", "grapes")]
d = [(v, k) for k, v in {v:k for k, v in ls}.items()]
print(d)
Output:
[('green', 'apple'), ('brown', 'grapes'), ('yellow', 'banana'), ('white', 'litchi')]