How to get values from a list of dictionaries, which themselves contain lists of dictionaries in Pyt-CodePudding

I encountered the problem of getting values from a list that contains dictionaries, where each of the dictionaries has a list with a dictionary. May sound easy to do but it took me some time and I think it can be useful for other people if I post it. An example of my data can be:

player_info = [{[{'tag': 'tag 1'}]},
               {[{'tag': 'tag 2'}]}]

The outer list is called 'player_info'. This contained 25 dictionaries, where each one contains a list that contains (among other things) a dictionary called 'opponent' which contains a list that contains a dictionary (yeah, pretty messy). From that innermost dictionary, I wanted the value associated with the 'tag' key.

I figured two ways:

Create a loop.

for i in range(25):
    print(player_info[i]['opponent'][0]['tag'])

Iterate through list:

{each_dictionary['opponent'][0]['tag'] for each_dictionary in player_info}

I assume that the second way must be more efficient. Let me know what you think, and whether there is a smarter way to do it.

CodePudding user response：

First: dict's require a key-value association for every element in the dictionary. Your 2nd level data structure though does not include keys: ({[{'tag': 'tag 1'}]}) This is a set. Unlike dict's, set's do not have keys associated with their elements. So your data structure looks like List[Set[List[Dict[str, str]]]].

Second: when I try to run

# python 3.8.8
player_info = [{[{'tag': 'tag 1'}]},
               {[{'tag': 'tag 2'}]}]

I recieve the error TypeError: unhashable type: 'list'. That's because you're code attempts to contain a list inside a set. Set membership in python demands the members to be hashable. However, you will not find a __hash__() function defined on list objects. Even if you resolve this by replacing the list with a tuple, you will find that dict objects are not hashable either. Potential solutions include using immutable objects like frozendict or tuple, but that is another post.

To answer your question, I have reformulated your problem as

player_info = [[[{'tag': 'tag 1'}]],
               [[{'tag': 'tag 2'}]]]

and compared the performance difference with A) explicit loops:

for i in range(len(player_info)):
  print(player_info[i][0][0]['tag'])

against B) list comprehension

[
  print(single_player_info[0][0]['tag']) 
  for single_player_info in player_info
]

Running the above code blocks in jupyter with the %%timeit cell magic, I got: A) 154 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) and B) 120 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Note: This experiment is highly skewed for at least two reasons:

I tested both trials using only the data you provided (N=2). It is very likely that we would observe different scaling behaviors than initial conditions suggest.
print consumes a lot of time and makes this problem heavily subject to the status of the kernel

I hope this answers your question.