I have a large large DataFrame. And I want to add a series to every row of it.
The following is the current way for achieving my goal:
print(df.shape) # (31676, 3562)
diff = 4.1 - df.iloc[0] # I'd like to add diff to every row of df
for i in range(len(df)):
df.iloc[i] = df.iloc[i] diff
The method takes a lot of time. Are there any other efficient way of doing this?
CodePudding user response:
You can subtract Series
for vectorize operation, a bit faster should be subtract by numpy array:
np.random.seed(2022)
df = pd.DataFrame(np.random.rand(1000,1000))
In [51]: %timeit df.add(4.1).sub(df.iloc[0])
7.99 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [52]: %timeit df 4.1 - df.iloc[0]
8.46 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [53]: %timeit df 4.1 - df.iloc[0].to_numpy()
7.59 ms ± 59.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [49]: %%timeit
...: diff = 4.1 - df.iloc[0] # I'd like to add diff to every row of df
...: for i in range(len(df)):
...: df.iloc[i] = df.iloc[i] diff
...:
433 ms ± 50.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)