I have some time series data that requires multiplying constants by variables at time t. I have come up with 3 methods to get an answer that is correct.
The main thing I am wondering is Q1 below. I appreciate Q2 and Q3 could be subjective, but I am mostly seeing if there is a much better method I am completely missing.
- Q1. Is there a much better way to implement this formula across a dataframe/array that I have missed (i.e. not one of these three methods in the code)? If so please let me know.
More subjectively... I could time each method and choose one purely by the most time efficient method, I was wondering:
Q2. Are any of these certain methods preferred as they are clearer / better written / use less resource / more 'Pythonic'?
Q3. Or is it just the case that any of these 3 are absolutely fine and it is just a preference thing? One large reason I ask is that I often hear people trying to shy away from loops...
The formula to be applied is:
ans_t = x * var1_t (1 - x) * var2_t - y * max(0, var3_t - z)
note: _t means at time t
Due to the time series nature of it, I could not get something like this to work:
x * df['var1'] (1 - x) * df['var2'] - y * max(0, df['var3'] - z)
Therefore I went for the 3 methods below:
# %%
import numpy as np
import pandas as pd
# example dataframe
df = pd.DataFrame({'var1': [6, 8, 11, 15, 10], 'var2': [1, 8, 2, 15, 4], 'var3': [21, 82, 22, 115, 64]})
# constants
x = 0.44
y = 1.68
z = 22
# function to evaluate: ans_t = x * var1_t (1 - x) * var2_t - y * max(0, var3_t - z)
# note: _t means at time t
# %%
# ---- Method 1: use simple for loop ----
df['ans1'] = 0
for i in range(len(df)):
df['ans1'][i] = x * df['var1'][i] (1 - x) * df['var2'][i] - y * max(0, df['var3'][i] - z)
# %%
# ---- Method 2: apply a lambda function ----
def my_func(var1, var2, var3):
return x * var1 (1 - x) * var2 - y * max(0, var3 - z)
df['ans2'] = df.apply(lambda x: my_func(x['var1'], x['var2'], x['var3']), axis=1)
# %%
# ---- Method 3: numpy vectorize ----
df['ans3'] = np.vectorize(my_func)(df['var1'], df['var2'], df['var3'])
CodePudding user response:
np.maximum
(note: not the same as np.max
) gives a vectorized way of handling the max
element of the formula:
df['ans_t'] = x * df['var1'] (1 - x) * df['var2'] - y * np.maximum(0, df['var3'] - z)
after which df['ans_t']
is:
0 3.20
1 -92.80
2 5.96
3 -141.24
4 -63.92
Name: ans_t, dtype: float64