Home > database >  How to filter list of dictionaries in python?
How to filter list of dictionaries in python?

Time:12-23

I have a list of dictionaries which is as follow-

VehicleList = [
        {
            'id': '1',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 10, 16, 9, 44, 872000)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        },
        {
            'id': '4',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 10, 21, 1, 00, 300012)
        },
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        }
    ]

How can I get a list of the latest vehicles for each 'VehicleType' based on their 'CreationDate'?

I expect something like this-

latestVehicles = [
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        }
    ]

I tried separating out each dictionary based on their 'VehicleType' into different lists and then sorting them according to their 'CreationDate' and then picking up the latest one.

I believe there might be a more optimal way to do this.

CodePudding user response:

Use a dictionary mapping from VehicleType value to the dictionary you want in your final list. Compare the date of each item in the input list with the one your dict, and keep the later one.

latest_dict = {}

for vehicle in VehicleList:
    t = vehicle['VehicleType']
    if t not in latest_dict or vehicle['CreationDate'] > latest_dict[t]['CreationDate']:
        latest_dict[t] = vehicle

latestVehicles = list(latest_dict.values())

CodePudding user response:

Here is a solution using max and filter:

VehicleLatest = [
    max(
        filter(lambda _: _["VehicleType"] == t, VehicleList), 
        key=lambda _: _["CreationDate"]
    ) for t in {_["VehicleType"] for _ in VehicleList}
]

Result

print(VehicleLatest)
# [{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)}, {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}, {'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)}]

CodePudding user response:

I think you can acheive what you want using the groupby function from itertools.

from itertools import groupby

# entries sorted according to the key we wish to groupby: 'VehicleType'
VehicleList = sorted(VehicleList, key=lambda x: x["VehicleType"])

latestVehicles = []

# Then the elements are grouped.
for k, v in groupby(VehicleList, lambda x: x["VehicleType"]):
    # We then append to latestVehicles the 0th entry of the
    # grouped elements after sorting according to the 'CreationDate'
    latestVehicles.append(sorted(list(v), key=lambda x: x["CreationDate"], reverse=True)[0])

CodePudding user response:

Sort by 'VehicleType' and 'CreationDate', then create a dictionary from 'VehicleType' and vehicle to get the latest vehicle for each type:

VehicleList.sort(key=lambda x: (x.get('VehicleType'), x.get('CreationDate')))
out = list(dict(zip([item.get('VehicleType') for item in VehicleList], VehicleList)).values())

Output:

[{'id': '2',
  'VehicleType': 'Bike',
  'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '5',
  'VehicleType': 'Car',
  'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '3',
  'VehicleType': 'Truck',
  'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

CodePudding user response:

This is very straightforwards in pandas. First load the list of dicts as a pandas dataframe, then sort the values by date, take the top n items (3 in the example below), and export to dict.

import pandas as pd

df = pd.DataFrame(VehicleList)
df.sort_values('CreationDate', ascending=False).head(3).to_dict(orient='records')

CodePudding user response:

You can use the operator to achieve that goal:

import operator
my_sorted_list_by_type_and_date = sorted(VehicleList, key=operator.itemgetter('VehicleType', 'CreationDate'))

CodePudding user response:

A small plea for more readable code:

from operator import itemgetter
from itertools import groupby

vtkey = itemgetter('VehicleType')
cdkey = itemgetter('CreationDate')

latest = [
    # Get latest from each group.
    max(vs, key = cdkey)
    # Sort and group by VehicleType.
    for g, vs in groupby(sorted(vehicles, key = vtkey), vtkey)
]

CodePudding user response:

A variation on Blckknght's answer using defaultdict to avoid the long if condition:

from collections import defaultdict
import datetime
from operator import itemgetter

latest_dict = defaultdict(lambda: {'CreationDate': datetime.datetime.min})

for vehicle in VehicleList:
    t = vehicle['VehicleType']
    latest_dict[t] = max(vehicle, latest_dict[t], key=itemgetter('CreationDate'))

latestVehicles = list(latest_dict.values())

latestVehicles:

[{'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]
  • Related