I have following class structure
class Member:
def __init__(self, id):
self.member_id = id
self.items = []
class Item:
def __init__(self, id, name, last_updated):
self.item_id = id
self.last_updated = last_updated
self.name = name
I have a dictionary where I have list of Member objects for each key (it is generated based on some condition but it is something like nlm123, npl334, etc.). So the sample looks like thisL
{
"nlm123": [
{
"member_id": 1,
"items": [
{
"item_id": "1",
"last_updated": "2020-10-31 10:05:17",
"name": "shirt"
},
{
"item_id": "2",
"last_updated": "2020-10-30 10:05:17",
"name": "shirt_shirt"
}
]
},
{
"member_id": 2,
"items": [
{
"item_id": "1",
"last_updated": "2020-11-01 10:05:17",
"name": "shirt"
},
{
"item_id": "2",
"last_updated": "2020-11-04 10:05:17",
"name": "shirt_shirt"
}
]
}
]
}
I want an efficient way to iterate items of dictionary and identify object of Member which have the latest updated item based on last_updated
attribute. In my current implementation I end up using three for loops which I think is not the ideal solution. Output should be an object with "member_id": 2 because it has the latest updated item.
CodePudding user response:
I can't see this being possible without three loops, I guess you have something like this?
from datetime import datetime
def get_last_updated_member(data):
latest = None
last_updated = None
for members in data.values():
for member in members:
for item in member["items"]:
item_updated = datetime.strptime(item["last_updated"], "%Y-%m-%d %H:%M:%S")
if not last_updated or item_updated > last_updated:
last_updated = item_updated
latest = member
return latest
We can't do much else except some tricks to flatten out the code a bit:
import itertools
from datetime import datetime
def get_last_updated_member(data):
latest = None
last_updated = None
all_members = itertools.chain.from_iterable(data.values())
member_items = ((member, item) for member in all_members for item in member["items"])
for member, item in member_items:
item_updated = datetime.strptime(item["last_updated"], "%Y-%m-%d %H:%M:%S")
if not last_updated or item_updated > last_updated:
last_updated = item_updated
latest = member
return latest
We make use of two generators. This version is no more efficient than the original one (we iterate through the same number of items), but it's not worse either (we still only iterate through everything once and don't build any intermediate data structures).
The first generator is itertools.chain.from_iterable(data.values())
which flattens the data
dict by concatenating the members lists from each entry.
The second generator is ((member, item) for member in all_members for item in member["items"])
...it looks like list comprehension syntax but with (...)
instead of [...]
. This is called a "generator expression". This one basically gives us a convenient way to access the current parent member
while iterating over all the items
.
So now we can iterate over our generator and we only have one level of explicit for
loop. The inner code of the loop is the same as before.
Here is another alternative, using the reduce
function from functools
:
import itertools
from datetime import datetime
from functools import reduce
def get_last_updated_member(data):
all_members = itertools.chain.from_iterable(data.values())
member_last_updated = (
(member, datetime.strptime(item["last_updated"], "%Y-%m-%d %H:%M:%S"))
for member in all_members
for item in member["items"]
)
def reducer(result, current):
if not result[1] or current[1] > result[1]:
return current
return result
member, last_updated = reduce(reducer, member_last_updated, (None, None))
return member
Here we amend the generator to to return tuples of: (member, item_updated): Tuple[dict, datetime]
(we could have done this in the previous version too). Then the reduce
function iterates through the generator, applying the reducer
function that we defined to compare each item with the current best match and return the new best match. When the generator is exhausted reduce
returns the final best match to our code.
Arguably the original version is more readable than either of the latter two, despite the pythonic principle of "flat is better than nested".