I have a dataframe like this:
col1=[i for i in range(10)]
col2=[i**2 for i in range(10)]
df=pd.DataFrame(list(zip(col1,col2)),columns=['col1','col2'])
I want to create a new column using apply that adds the numbers in each row and then it adds then index. Something like
df['col3']=df.apply(lambda x:x['col1'] x['col2'] index(x))
But of course index(x) does not work.
How can I do it in this setting?
CodePudding user response:
Your solution is possible with axis=1
and x.name
, but because loops it is slow:
df['col3'] = df.apply(lambda x: x['col1'] x['col2'] x.name, axis=1)
Vectorized solution is add df.index
:
df['col3'] = df['col1'] df['col2'] df.index
Performance in 10k sample data:
N = 10000
df=pd.DataFrame({'col1':np.arange(N),
'col2':np.arange(N) ** 2})
In [234]: %timeit df['col3'] = df.apply(lambda x: x['col1'] x['col2'] x.name, axis=1)
131 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [235]: %timeit df['col3'] = df['col1'] df['col2'] df.index
654 µs ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
CodePudding user response:
No need for a function for such a simple case, a vectorized addition will be much more efficient:
df['col3'] = df['col1'] df['col2'] df.index
Output:
col1 col2 col3
0 0 0 0
1 1 1 3
2 2 4 8
3 3 9 15
4 4 16 24
5 5 25 35
6 6 36 48
7 7 49 63
8 8 64 80
9 9 81 99