Apologies in advance if this isn't a good question, I'm a beginner in DataFrames...
I have a large dataframe (about a thousands rows and 5000 columns).
The first 5000 columns contain numbers, and I need to do some operations on each of these numbers based on the values of other columns.
For instance multiply the 5000 first numbers on a row with the value of another column on the same row.
Index | 1 | 2 | 3 | 4 | ... | 5000 | a | b | c | d |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.1 | 0.4 | 0.8 | 0.6 | ... | 0.3 | 3 | 7 | 2 | 9 |
1 | 0.7 | 0.5 | 0.4 | 0.8 | ... | 0.1 | 4 | 6 | 1 | 3 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1000 | 0.2 | 0.5 | 0.1 | 0.9 | ... | 0.6 | 6 | 8 | 5 | 4 |
This is an example of code that is multiplying my numbers by the column "a", then muliply by a constant and then get the exponential of that :
a_col = df.get_loc("a")
df.iloc[: , : 5000 ] = np.exp (df.iloc[: , : 5000 ] * df.iloc[: , [a_col]].to_numpy() * np.sqrt(4) )
While the results look fine, it does feel slow, especially compared to the code I'm trying to replace that was doing these operations rows by rows in a loop.
Is this the proper way to do what I'm trying to achieve, or am I doing something wrong ?
Thank you for your help !
CodePudding user response:
Use .values
method to get the numpy arrays, np.newaxis
to make df.a
a column vector and multiply row-wise:
df.iloc[: , : 5000 ] = np.exp(df.iloc[: , : 5000 ].values * df.a.values[:, np.newaxis] * np.sqrt(4) )
CodePudding user response:
Try this:
df.iloc[:, :5000] = np.exp(df.iloc[:, :5000].values * a_col.to_numpy().reshape(-1,1) * np.sqrt(4))
It took just a few seconds to run (for the 5 million cells).
If it works, I'll explain it :)