Home > front end >  Given a list of values in a column in pandas DataFrame, how to output values from another column in
Given a list of values in a column in pandas DataFrame, how to output values from another column in

Time:10-06

The problem is simple, the input is a list of non-container objects (int, str etc.), all elements inside the list are contained inside a column in a DataFrame, the task is, for each element inside the list, find the object (only its value, not the array) in another column in the same row.

The problem will be better demonstrated in code:

from pandas import DataFrame
digits = '0123456789abcdef'
df = DataFrame([(a,b) for a, b in zip(digits, range(16))], columns=['hex', 'dec'])
df
df.loc[df.dec == 12, 'hex']
df.loc[df.dec == 12, 'hex'].values[0]
import random
eight = random.sample(range(16), 8)
eight
fun = lambda x: df.loc[df.dec == x, 'hex'].values[0]
''.join(fun(i) for i in eight)
''.join(map(fun, eight))

As you can see I can already do this, but I am using a for loop, and the performance isn't very impressive, I know pandas and numpy are all about vectorization, I wonder is there a built-in way to do this...

In [1]: from pandas import DataFrame

In [2]: digits = '0123456789abcdef'

In [3]: df = DataFrame([(a,b) for a, b in zip(digits, range(16))], columns=['hex', 'dec'])

In [4]: df
Out[4]:
   hex  dec
0    0    0
1    1    1
2    2    2
3    3    3
4    4    4
5    5    5
6    6    6
7    7    7
8    8    8
9    9    9
10   a   10
11   b   11
12   c   12
13   d   13
14   e   14
15   f   15

In [5]: df.loc[df.dec == 12, 'hex']
Out[5]:
12    c
Name: hex, dtype: object

In [6]: df.loc[df.dec == 12, 'hex'].values[0]
Out[6]: 'c'

In [7]: import random

In [8]: eight = random.sample(range(16), 8)

In [9]: eight
Out[9]: [9, 7, 1, 6, 11, 12, 14, 10]

In [10]: fun = lambda x: df.loc[df.dec == x, 'hex'].values[0]

In [11]: ''.join(fun(i) for i in eight)
Out[11]: '9716bcea'

In [12]: ''.join(map(fun, eight))
Out[12]: '9716bcea'

In [13]: %timeit ''.join(fun(i) for i in eight)
2.34 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [14]: %timeit ''.join(map(fun, eight))
2.34 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So what is a vectorized way to achieve the same result as the method demonstrated in the code?

CodePudding user response:

A vectorized way would be to construct a Series:

series = df.set_index('dec')['hex']
''.join(series[eight])

Output: '9716bcea'

  • Related