Here is my initial dictionary:
dic = {'key1': [2,3],
'key2': [5,1],
'key3': [6,8]}
NB: I am using simple numbers 2,3, etc. for the purpose of the example (list of dfs on my side).
For each key, I would like to extract the first element and obtain the following result:
dic2 = {'key1': 2,
'key2': 5,
'key3': 6}
Is it possible to do it without going for a slow for loop ? The dictionary is quite large...
Many thanks in advance for your help.
CodePudding user response:
If you expect to access only a few keys of the dictionary after this conversion, you could write a wrapper; something like:
class ViewFirst:
def __init__(self, original):
self.original = original
def __getitem__(self, key):
return self.original[key][0]
Another option is to base it on defaultdict
; that will let you do things like assigning new values into the dictionary (new or existing keys) while still retrieving other values from the original dictionary:
class DictFromFirsts(collections.defaultdict):
def __init__(self, original):
self.original = original
def __missing__(self, key):
return self.original[key][0]
CodePudding user response:
A dictionary comprehension is the way to go here:
{k: v[0] for k, v in dic.items()}
Or use operator.itemgetter
:
>>> from operator import itemgetter
>>> dict(zip(dic, map(itemgetter(0), dic.values())))
{'key1': 2, 'key2': 5, 'key3': 6}
>>>
CodePudding user response:
you can try this:
dict(zip(dic.keys(),list(list(zip(*dic.values()))[0])))
Output:
{'key1': 2, 'key2': 5, 'key3': 6}
CodePudding user response:
Let's see how "slow" a for loop really is here. My solution:
dic2 = {}
for k, (dic2[k], _) in dic.items():
pass
Benchmark with your toy dict:
600 ns 602 ns 603 ns U12_Forward_1
1019 ns 1025 ns 1027 ns U12_Forward_2
1347 ns 1350 ns 1355 ns user1740577
441 ns 442 ns 443 ns dont_talk_just_code
Benchmark with a "large" dict of 12k items as you commented:
1412624 ns 1414927 ns 1418089 ns U12_Forward_1
1687464 ns 1690134 ns 1696759 ns U12_Forward_2
1961205 ns 1986729 ns 2005884 ns user1740577
1248901 ns 1260306 ns 1261295 ns dont_talk_just_code
The above were done on tio.run, which I use for its high speed and speed stability. Sadly it doesn't offer cytoolz
needed by Marco's answer, so I couldn't include that. Then @user1740577 accused me of lying because I didn't include it, so here are results from replit.com, where I can (note that if you run it there and don't have a paid account, your times will be slower):
475 ns 476 ns 484 ns U12_Forward_1
804 ns 805 ns 807 ns U12_Forward_2
1075 ns 1079 ns 1082 ns user1740577
442 ns 444 ns 448 ns Marco_DG
360 ns 360 ns 360 ns dont_talk_just_code
1060461 ns 1061449 ns 1071588 ns U12_Forward_1
1294079 ns 1330157 ns 1706065 ns U12_Forward_2
1593082 ns 1594114 ns 1596703 ns user1740577
1268663 ns 1274264 ns 1286715 ns Marco_DG
964445 ns 965971 ns 966333 ns dont_talk_just_code
Full benchmark code (also at replit:
from timeit import repeat
from functools import partial
from operator import itemgetter
from cytoolz import valmap , first
def U12_Forward_1(dic):
return {k: v[0] for k, v in dic.items()}
def U12_Forward_2(dic):
return dict(zip(dic, map(itemgetter(0), dic.values())))
def user1740577(dic):
return dict(zip(dic.keys(),list(list(zip(*dic.values()))[0])))
def Marco_DG(dic):
return valmap(first, dic)
def dont_talk_just_code(dic):
dic2 = {}
for k, (dic2[k], _) in dic.items():
pass
return dic2
funcs = U12_Forward_1, U12_Forward_2, user1740577, Marco_DG, dont_talk_just_code
def bench(dic, number):
expect = funcs[0](dic)
for func in funcs:
result = func(dic)
print(result == expect, func.__name__)
print()
for _ in range(3):
for func in funcs:
ts = sorted(repeat(partial(func, dic), number=number))[:3]
print(*('} ns ' % (t / number * 1e9) for t in ts), func.__name__)
print()
bench({'key1': [2,3], 'key2': [5,1], 'key3': [6,8]}, 100000)
bench({f'key{i}': [i,42] for i in range(12000)}, 30)
CodePudding user response:
Personally I would go with a library that does that using Cython under the hood: cytoolz
pip3 install cytoolz
from cytoolz import valmap , first
dic = {'key1': [2,3],
'key2': [5,1],
'key3': [6,8]}
dic2 = valmap(first, dic)
I will extend @don'ttalkjustcode tests with my function benchmark, cleary his one is the fastest one, with the only caveat of being limited to 2-elements arrays. I'm still trying to figure out how to test the one by @Jiří Baum.
422 ns 422 ns 422 ns marco_dg
462 ns 462 ns 462 ns U12_Forward_1
765 ns 766 ns 769 ns U12_Forward_2
1076 ns 1081 ns 1088 ns user1740577
341 ns 341 ns 341 ns dont_talk_just_code
.
1206537 ns 1208705 ns 1211105 ns marco_dg
1009374 ns 1011324 ns 1011989 ns U12_Forward_1
1232356 ns 1232728 ns 1251990 ns U12_Forward_2
1380953 ns 1382381 ns 1390140 ns user1740577
848863 ns 850010 ns 850450 ns dont_talk_just_code