Python dictionary comprehension to group together equal keys-CodePudding

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key. Code bellow works, but I am trying to convert it to a Dictionary comprehension

group togheter subblocks if they have equal ObjectID

output = {}
subblkDBF : list[dict]
for row in subblkDBF:
    if row["OBJECTID"] not in output:
        output[row["OBJECTID"]] = []
    output[row["OBJECTID"]].append(row)

CodePudding user response：

Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:

output = defaultdict(list)
for row in subblkDBF:
    output[row['OBJECTID']].append(row)

The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):

{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}

Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.

As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.

CodePudding user response：

If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:

from itertools import groupby
from operator import itemgetter

output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}

However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).

key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}

CodePudding user response：

You can adding an else block to safe on time n slightly improve perfomrance a little:

output = {}
subblkDBF : list[dict]
for row in subblkDBF:
    if row["OBJECTID"] not in output:
        output[row["OBJECTID"]] = [row]
    else:
        output[row["OBJECTID"]].append(row)