Given a following dataframe df
:
date mom_pct
0 2020-1-31 1.4
1 2020-2-29 0.8
2 2020-3-31 -1.2
3 2020-4-30 -0.9
4 2020-5-31 -0.8
5 2020-6-30 -0.1
6 2020-7-31 0.6
7 2020-8-31 0.4
8 2020-9-30 0.2
9 2020-10-31 -0.3
10 2020-11-30 -0.6
11 2020-12-31 0.7
12 2021-1-31 1.0
13 2021-2-28 0.6
14 2021-3-31 -0.5
15 2021-4-30 -0.3
16 2021-5-31 -0.2
17 2021-6-30 -0.4
18 2021-7-31 0.3
19 2021-8-31 0.1
20 2021-9-30 0.0
21 2021-10-31 0.7
22 2021-11-30 0.4
23 2021-12-31 -0.3
24 2022-1-31 0.4
25 2022-2-28 0.6
26 2022-3-31 0.0
27 2022-4-30 0.4
28 2022-5-31 -0.2
I want to compare the chain ratio value of a month of the current year to the value of the month of the previous year. Assume that the value of the same period last year is y_t-1, and the current value of this year is y_t. I will create a new column according to the following rules:
- If y_t = y_t-1, returns 0 for new column;
- If y_t ∈ (y_t-1, y_t-1 0.3], returns 1;
- If y_t ∈ (y_t-1 0.3, y_t-1 0.5], returns 2;
- If y_t > (y_t-1 0.5), returns 3;
- If y_t ∈ [y_t-1 - 0.3, y_t-1), returns -1;
- If y_t ∈ [y_t-1 - 0.5, y_t-1 - 0.3), returns -2;
- If y_t < (y_t-1 - 0.5), returns -3
The expected result:
date mom_pct categorial_mom_pct
0 2020-1-31 1.0 NaN
1 2020-2-29 0.8 NaN
2 2020-3-31 -1.2 NaN
3 2020-4-30 -0.9 NaN
4 2020-5-31 -0.8 NaN
5 2020-6-30 -0.1 NaN
6 2020-7-31 0.6 NaN
7 2020-8-31 0.4 NaN
8 2020-9-30 0.2 NaN
9 2020-10-31 -0.3 NaN
10 2020-11-30 -0.6 NaN
11 2020-12-31 0.7 NaN
12 2021-1-31 1.0 0.0
13 2021-2-28 0.6 -1.0
14 2021-3-31 -0.5 3.0
15 2021-4-30 -0.3 3.0
16 2021-5-31 -0.2 3.0
17 2021-6-30 -0.4 -1.0
18 2021-7-31 0.3 -1.0
19 2021-8-31 0.1 -1.0
20 2021-9-30 0.0 -1.0
21 2021-10-31 0.7 3.0
22 2021-11-30 0.4 3.0
23 2021-12-31 -0.3 -3.0
24 2022-1-31 0.4 -3.0
25 2022-2-28 0.6 0.0
26 2022-3-31 0.0 2.0
27 2022-4-30 0.4 3.0
28 2022-5-31 -0.2 0.0
I attempt to create multiple columns and ranges, then check mom_pct
is in which range. Is it possible to do that in a more effecient way? Thanks.
df1['mom_pct_zero'] = df1['mom_pct'].shift(12)
df1['mom_pct_pos1'] = df1['mom_pct'].shift(12) 0.3
df1['mom_pct_pos2'] = df1['mom_pct'].shift(12) 0.5
df1['mom_pct_neg1'] = df1['mom_pct'].shift(12) - 0.3
df1['mom_pct_neg2'] = df1['mom_pct'].shift(12) - 0.5
CodePudding user response:
I would do it as follows
def categorize(v):
if np.isnan(v) or v == 0.:
return v
sign = -1 if v < 0 else 1
eps = 1e-10
if abs(v) <= 0.3 eps:
return sign * 1
if abs(v) <= 0.5 eps:
return sign * 2
return sign * 3
df['categorial_mom_pct'] = df['mom_pct'].diff(12).map(categorize)
print(df)
Note that I added a very small eps
to the threshold to counter the precision issue with floating point arithmetic
abs(-0.3) <= 0.3 # True
abs(-0.4 0.1) <= 0.3 # False
abs(-0.4 0.1) <= 0.3 1e-10 # True
Out:
date mom_pct categorial_mom_pct
0 2020-1-31 1.0 NaN
1 2020-2-29 0.8 NaN
2 2020-3-31 -1.2 NaN
3 2020-4-30 -0.9 NaN
4 2020-5-31 -0.8 NaN
5 2020-6-30 -0.1 NaN
6 2020-7-31 0.6 NaN
7 2020-8-31 0.4 NaN
8 2020-9-30 0.2 NaN
9 2020-10-31 -0.3 NaN
10 2020-11-30 -0.6 NaN
11 2020-12-31 0.7 NaN
12 2021-1-31 1.0 0.0
13 2021-2-28 0.6 -1.0
14 2021-3-31 -0.5 3.0
15 2021-4-30 -0.3 3.0
16 2021-5-31 -0.2 3.0
17 2021-6-30 -0.4 -1.0
18 2021-7-31 0.3 -1.0
19 2021-8-31 0.1 -1.0
20 2021-9-30 0.0 -1.0
21 2021-10-31 0.7 3.0
22 2021-11-30 0.4 3.0
23 2021-12-31 -0.3 -3.0
24 2022-1-31 0.4 -3.0
25 2022-2-28 0.6 0.0
26 2022-3-31 0.0 2.0
27 2022-4-30 0.4 3.0
28 2022-5-31 -0.2 0.0