Arithmetic operations on large dataframe-CodePudding

Apologies in advance if this isn't a good question, I'm a beginner in DataFrames...

I have a large dataframe (about a thousands rows and 5000 columns).

The first 5000 columns contain numbers, and I need to do some operations on each of these numbers based on the values of other columns.

For instance multiply the 5000 first numbers on a row with the value of another column on the same row.

Index	1	2	3	4	...	5000	a	b	c	d
0	0.1	0.4	0.8	0.6	...	0.3	3	7	2	9
1	0.7	0.5	0.4	0.8	...	0.1	4	6	1	3
...	...	...	...	...	...	...	...	...	...	...
1000	0.2	0.5	0.1	0.9	...	0.6	6	8	5	4

This is an example of code that is multiplying my numbers by the column "a", then muliply by a constant and then get the exponential of that :

a_col = df.get_loc("a")
            
df.iloc[: , : 5000 ] = np.exp (df.iloc[: ,  : 5000 ] * df.iloc[: , [a_col]].to_numpy() * np.sqrt(4) )

While the results look fine, it does feel slow, especially compared to the code I'm trying to replace that was doing these operations rows by rows in a loop.

Is this the proper way to do what I'm trying to achieve, or am I doing something wrong ?

Thank you for your help !

CodePudding user response：

Use .values method to get the numpy arrays, np.newaxis to make df.a a column vector and multiply row-wise:

df.iloc[: , : 5000 ] = np.exp(df.iloc[: ,  : 5000 ].values * df.a.values[:, np.newaxis] * np.sqrt(4) )

CodePudding user response：

Try this:

df.iloc[:, :5000] = np.exp(df.iloc[:, :5000].values * a_col.to_numpy().reshape(-1,1) * np.sqrt(4))

It took just a few seconds to run (for the 5 million cells).

If it works, I'll explain it :)