Home > front end >  Remove duplicates from list of lists by column value
Remove duplicates from list of lists by column value

Time:02-13

I have a list of list that look like this, they have been sorted so that duplicate IDs are arranged with the one I want to keep at the top..

[
    {'id': '23', 'type': 'car', 'price': '445'},
    {'id': '23', 'type': 'car', 'price': '78'},
    {'id': '23', 'type': 'car', 'price': '34'},
    {'id': '125', 'type': 'truck', 'price': '998'},
    {'id': '125', 'type': 'truck', 'price': '722'},
    {'id': '125', 'type': 'truck', 'price': '100'},
    {'id': '87', 'type': 'bike', 'price': '50'},
]

What is the simplest way to remove rows that have duplicate IDs but always keep the first one? In this instance the end result would look like this...

[
    {'id': '23', 'type': 'car', 'price': '445'},
    {'id': '125', 'type': 'truck', 'price': '998'},
    {'id': '87', 'type': 'bike', 'price': '50'},
]

I know I can remove duplicates from lists by converting to set like set(my_list) but in this instance it is duplicates by ID that I want to remove by

CodePudding user response:

Here's an answer that involves no external modules or unnecessary manipulation of the data:

data = [
    {'id': '23', 'type': 'car', 'price': '445'},
    {'id': '23', 'type': 'car', 'price': '78'},
    {'id': '23', 'type': 'car', 'price': '34'},
    {'id': '125', 'type': 'truck', 'price': '998'},
    {'id': '125', 'type': 'truck', 'price': '722'},
    {'id': '125', 'type': 'truck', 'price': '100'},
    {'id': '87', 'type': 'bike', 'price': '50'},
]
seen = set()
result = [row for row in data if row['id'] not in seen and not seen.add(row['id'])]
print(result)

Result:

[{'id': '23', 'type': 'car', 'price': '445'},
 {'id': '125', 'type': 'truck', 'price': '998'},
 {'id': '87', 'type': 'bike', 'price': '50'}]

Note that the not seen.add(row['id'])] part of the list comprehension will always be True. It's just a way of noting that a unique entry has been seen by adding it to the seen set.

CodePudding user response:

Since you already hav the list sorted properly, a simple way to do this is to use itertools.groupby to grab the first element of each group in a list comprehension:

from itertools import groupby

l= [
    {'id': '23', 'type': 'car', 'price': '445'},
    {'id': '23', 'type': 'car', 'price': '78'},
    {'id': '23', 'type': 'car', 'price': '34'},
    {'id': '125', 'type': 'truck', 'price': '998'},
    {'id': '125', 'type': 'truck', 'price': '722'},
    {'id': '125', 'type': 'truck', 'price': '100'},
    {'id': '87', 'type': 'bike', 'price': '50'},
]

[next(g) for k, g in groupby(l, key=lambda d: d['id'])]

# [{'id': '23', 'type': 'car', 'price': '445'},
#  {'id': '125', 'type': 'truck', 'price': '998'},
#  {'id': '87', 'type': 'bike', 'price': '50'}]

CodePudding user response:

I would probably convert to Pandas DataFrame and then use drop_duplicates

import pandas as pd
data = [
{'id': '23', 'type': 'car', 'price': '445'},
{'id': '23', 'type': 'car', 'price': '78'},
{'id': '23', 'type': 'car', 'price': '34'},
{'id': '125', 'type': 'truck', 'price': '998'},
{'id': '125', 'type': 'truck', 'price': '722'},
{'id': '125', 'type': 'truck', 'price': '100'},
{'id': '87', 'type': 'bike', 'price': '50'},
]
df = pd.DataFrame(data)
df.drop_duplicates(subset=['id'], inplace=True)
print(df.to_dict('records'))

# Output
# [{'id': '23', 'type': 'car', 'price': '445'}, 
# {'id': '125', 'type': 'truck', 'price': '998'}, 
# {'id': '87', 'type': 'bike', 'price': '50'}]

CodePudding user response:

Let's take the name of the given list as data.

unique_ids = []
result = []
for item in data:
    if item["id"] not in unique_ids:
        result.append(item)
        unique_ids.append(item["id"])
print(result)

The result will be,

[{'id': '23', 'type': 'car', 'price': '445'},
 {'id': '125', 'type': 'truck', 'price': '998'},
 {'id': '87', 'type': 'bike', 'price': '50'}]
  • Related