Home > Blockchain >  Apply function on pandas using the index
Apply function on pandas using the index

Time:11-15

I have a dataframe like this:

col1=[i for i in range(10)]
col2=[i**2 for i in range(10)]
df=pd.DataFrame(list(zip(col1,col2)),columns=['col1','col2'])

I want to create a new column using apply that adds the numbers in each row and then it adds then index. Something like

df['col3']=df.apply(lambda x:x['col1'] x['col2'] index(x))

But of course index(x) does not work.

How can I do it in this setting?

CodePudding user response:

Your solution is possible with axis=1 and x.name, but because loops it is slow:

df['col3'] = df.apply(lambda x: x['col1']   x['col2']   x.name, axis=1)

Vectorized solution is add df.index:

df['col3'] = df['col1']   df['col2']   df.index

Performance in 10k sample data:

N = 10000
df=pd.DataFrame({'col1':np.arange(N),
                 'col2':np.arange(N) ** 2})



In [234]: %timeit df['col3'] = df.apply(lambda x: x['col1']   x['col2']   x.name, axis=1)
131 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [235]: %timeit df['col3'] = df['col1']   df['col2']   df.index
654 µs ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response:

No need for a function for such a simple case, a vectorized addition will be much more efficient:

df['col3'] = df['col1'] df['col2'] df.index

Output:

   col1  col2  col3
0     0     0     0
1     1     1     3
2     2     4     8
3     3     9    15
4     4    16    24
5     5    25    35
6     6    36    48
7     7    49    63
8     8    64    80
9     9    81    99
  • Related