Home > Blockchain >  Arithmetic operations on large dataframe
Arithmetic operations on large dataframe

Time:12-12

Apologies in advance if this isn't a good question, I'm a beginner in DataFrames...

I have a large dataframe (about a thousands rows and 5000 columns).

The first 5000 columns contain numbers, and I need to do some operations on each of these numbers based on the values of other columns.

For instance multiply the 5000 first numbers on a row with the value of another column on the same row.

Index 1 2 3 4 ... 5000 a b c d
0 0.1 0.4 0.8 0.6 ... 0.3 3 7 2 9
1 0.7 0.5 0.4 0.8 ... 0.1 4 6 1 3
... ... ... ... ... ... ... ... ... ... ...
1000 0.2 0.5 0.1 0.9 ... 0.6 6 8 5 4

This is an example of code that is multiplying my numbers by the column "a", then muliply by a constant and then get the exponential of that :

a_col = df.get_loc("a")
            
df.iloc[: , : 5000 ] = np.exp (df.iloc[: ,  : 5000 ] * df.iloc[: , [a_col]].to_numpy() * np.sqrt(4) )

While the results look fine, it does feel slow, especially compared to the code I'm trying to replace that was doing these operations rows by rows in a loop.

Is this the proper way to do what I'm trying to achieve, or am I doing something wrong ?

Thank you for your help !

CodePudding user response:

Use .values method to get the numpy arrays, np.newaxis to make df.a a column vector and multiply row-wise:

df.iloc[: , : 5000 ] = np.exp(df.iloc[: ,  : 5000 ].values * df.a.values[:, np.newaxis] * np.sqrt(4) )

CodePudding user response:

Try this:

df.iloc[:, :5000] = np.exp(df.iloc[:, :5000].values * a_col.to_numpy().reshape(-1,1) * np.sqrt(4))

It took just a few seconds to run (for the 5 million cells).

If it works, I'll explain it :)

  • Related