I encountered the problem of getting values from a list that contains dictionaries, where each of the dictionaries has a list with a dictionary. May sound easy to do but it took me some time and I think it can be useful for other people if I post it. An example of my data can be:
player_info = [{[{'tag': 'tag 1'}]}, {[{'tag': 'tag 2'}]}]
The outer list is called 'player_info'. This contained 25 dictionaries, where each one contains a list that contains (among other things) a dictionary called 'opponent' which contains a list that contains a dictionary (yeah, pretty messy). From that innermost dictionary, I wanted the value associated with the 'tag' key.
I figured two ways:
- Create a loop.
for i in range(25): print(player_info[i]['opponent'][0]['tag'])
- Iterate through list:
{each_dictionary['opponent'][0]['tag'] for each_dictionary in player_info}
I assume that the second way must be more efficient. Let me know what you think, and whether there is a smarter way to do it.
CodePudding user response:
First: dict
's require a key-value association for every element in the dictionary. Your 2nd level data structure though does not include keys: ({[{'tag': 'tag 1'}]}
) This is a set
. Unlike dict
's, set
's do not have keys associated with their elements. So your data structure looks like List[Set[List[Dict[str, str]]]]
.
Second: when I try to run
# python 3.8.8
player_info = [{[{'tag': 'tag 1'}]},
{[{'tag': 'tag 2'}]}]
I recieve the error TypeError: unhashable type: 'list'
. That's because you're code attempts to contain a list inside a set. Set membership in python demands the members to be hashable. However, you will not find a __hash__()
function defined on list
objects. Even if you resolve this by replacing the list
with a tuple
, you will find that dict
objects are not hashable either. Potential solutions include using immutable objects like frozendict
or tuple
, but that is another post.
To answer your question, I have reformulated your problem as
player_info = [[[{'tag': 'tag 1'}]],
[[{'tag': 'tag 2'}]]]
and compared the performance difference with A) explicit loops:
for i in range(len(player_info)):
print(player_info[i][0][0]['tag'])
against B) list comprehension
[
print(single_player_info[0][0]['tag'])
for single_player_info in player_info
]
Running the above code blocks in jupyter with the %%timeit
cell magic, I got:
A) 154 µs ± 14.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
and
B) 120 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Note: This experiment is highly skewed for at least two reasons:
- I tested both trials using only the data you provided (N=2). It is very likely that we would observe different scaling behaviors than initial conditions suggest.
print
consumes a lot of time and makes this problem heavily subject to the status of the kernel
I hope this answers your question.