Home > Blockchain >  Efficient way to add a pandas Series to a pandas DataFrame
Efficient way to add a pandas Series to a pandas DataFrame

Time:10-24

I have a large large DataFrame. And I want to add a series to every row of it.

The following is the current way for achieving my goal:

print(df.shape)  # (31676, 3562)
diff = 4.1 - df.iloc[0] # I'd like to add diff to every row of df
for i in range(len(df)):
    df.iloc[i] = df.iloc[i]   diff

The method takes a lot of time. Are there any other efficient way of doing this?

CodePudding user response:

You can subtract Series for vectorize operation, a bit faster should be subtract by numpy array:

np.random.seed(2022) 
   
df = pd.DataFrame(np.random.rand(1000,1000))


In [51]: %timeit df.add(4.1).sub(df.iloc[0])
7.99 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [52]: %timeit df   4.1 - df.iloc[0]
8.46 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [53]: %timeit df   4.1 - df.iloc[0].to_numpy()
7.59 ms ± 59.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

In [49]: %%timeit
    ...: diff = 4.1 - df.iloc[0] # I'd like to add diff to every row of df
    ...: for i in range(len(df)):
    ...:     df.iloc[i] = df.iloc[i]   diff
    ...:     
433 ms ± 50.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • Related