This is trivial in Excel, why is this so hard in Python?
The goal is to compute the state variable based on several conditions, including the previous value of the state variable as well. Value is the known integer, Min(3) and Max(3) are simply the minimum and maximum value of a 3 period rolling window shifted forward by one period. This is how far I got.
Index Value Max(3) Min(3)
0 10 nan nan
1 20 nan nan
2 15 nan nan
3 25 20 10
4 15 25 15
5 10 25 15
6 15 20 10
What is the best way to calculate the state variable based on the following conditions:
- a) If Value > Max(3) then 1
- b) If Value < Min(3) then 4
- c) If Value <= Max(3) & Value >= Min (3) & previous State = 1 or 2 then 2
- d) If Value <= Max(3) & Value >= Min (3) & previous State = 4 or 3 then 3
Should look like this in final DataFrame:
Index Value Max(3) Min(3) State
0 10 nan nan nan
1 20 nan nan nan
2 15 nan nan nan
3 25 20 10 1
4 15 25 15 2
5 10 25 15 4
6 15 20 10 3
I have mostly tried this using np.where() funtion but always run into problems once I approach c) and d) conditions.
CodePudding user response:
You can use this:
df.loc[df.Value.gt(df['Max(3)']), 'State'] = 1
df.loc[df.Value.lt(df['Min(3)']), 'State'] = 4
df.loc[df.Value.between(df['Min(3)'], df['Max(3)']) & (df.State.shift(1).isin((3, 4))), 'State'] = 3
df.loc[df.Value.between(df['Min(3)'], df['Max(3)']) & (df.State.shift(1).isin((1,2))), 'State'] = 2
Explaination:
the first statement checks where df.Value is greater than df['Max(3)'] and creates a new column 'State' filled with NaNs, and only 1s on the location of the condition
the seconds line sets 4s where df.Value is smaller than df.['Min(3)']
The last two statements check if df.Value is within Max(3) and Min(3) and compares the df.State last value (.shift). Note: you can also use .between here instead of .isin if the numbers are subsequential.
CodePudding user response:
np.select
can handle multiple conditions quite well with good readability
df['Value'] = df['Value'].astype(float)
conditions =
[
df['Value']>df['Max(3)'],
df['Value']<df['Min(3)'],
(df['Value']<=df['Max(3)']) & (df['Value']>= df['Min(3)']) & (df['State'].shift().isin((1,2))),
(df['Value']<=df['Max(3)']) & (df['Value']>= df['Min(3)']) & (df['State'].shift().isin((3,4)))
]
choicelist = [1, 4, 2, 3]
df['State'] = np.select(conditions, choicelist, np.nan)
print(df)
Index Value Max(3) Min(3) State
0 0 10.0 NaN NaN NaN
1 1 20.0 NaN NaN NaN
2 2 15.0 NaN NaN NaN
3 3 25.0 20.0 10.0 1.0
4 4 15.0 25.0 15.0 2.0
5 5 10.0 25.0 15.0 4.0
6 6 15.0 20.0 10.0 3.0