The problem is simple, the input is a list of non-container objects (int
, str
etc.), all elements inside the list are contained inside a column in a DataFrame
, the task is, for each element inside the list, find the object (only its value, not the array) in another column in the same row.
The problem will be better demonstrated in code:
from pandas import DataFrame
digits = '0123456789abcdef'
df = DataFrame([(a,b) for a, b in zip(digits, range(16))], columns=['hex', 'dec'])
df
df.loc[df.dec == 12, 'hex']
df.loc[df.dec == 12, 'hex'].values[0]
import random
eight = random.sample(range(16), 8)
eight
fun = lambda x: df.loc[df.dec == x, 'hex'].values[0]
''.join(fun(i) for i in eight)
''.join(map(fun, eight))
As you can see I can already do this, but I am using a for loop, and the performance isn't very impressive, I know pandas
and numpy
are all about vectorization, I wonder is there a built-in way to do this...
In [1]: from pandas import DataFrame
In [2]: digits = '0123456789abcdef'
In [3]: df = DataFrame([(a,b) for a, b in zip(digits, range(16))], columns=['hex', 'dec'])
In [4]: df
Out[4]:
hex dec
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 a 10
11 b 11
12 c 12
13 d 13
14 e 14
15 f 15
In [5]: df.loc[df.dec == 12, 'hex']
Out[5]:
12 c
Name: hex, dtype: object
In [6]: df.loc[df.dec == 12, 'hex'].values[0]
Out[6]: 'c'
In [7]: import random
In [8]: eight = random.sample(range(16), 8)
In [9]: eight
Out[9]: [9, 7, 1, 6, 11, 12, 14, 10]
In [10]: fun = lambda x: df.loc[df.dec == x, 'hex'].values[0]
In [11]: ''.join(fun(i) for i in eight)
Out[11]: '9716bcea'
In [12]: ''.join(map(fun, eight))
Out[12]: '9716bcea'
In [13]: %timeit ''.join(fun(i) for i in eight)
2.34 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [14]: %timeit ''.join(map(fun, eight))
2.34 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So what is a vectorized way to achieve the same result as the method demonstrated in the code?
CodePudding user response:
A vectorized way would be to construct a Series:
series = df.set_index('dec')['hex']
''.join(series[eight])
Output: '9716bcea'