Comparing one month's value of current year with previous year values adding or substracting mu-CodePudding

Given a following dataframe df:

          date  mom_pct
0    2020-1-31      1.4
1    2020-2-29      0.8
2    2020-3-31     -1.2
3    2020-4-30     -0.9
4    2020-5-31     -0.8
5    2020-6-30     -0.1
6    2020-7-31      0.6
7    2020-8-31      0.4
8    2020-9-30      0.2
9   2020-10-31     -0.3
10  2020-11-30     -0.6
11  2020-12-31      0.7
12   2021-1-31      1.0
13   2021-2-28      0.6
14   2021-3-31     -0.5
15   2021-4-30     -0.3
16   2021-5-31     -0.2
17   2021-6-30     -0.4
18   2021-7-31      0.3
19   2021-8-31      0.1
20   2021-9-30      0.0
21  2021-10-31      0.7
22  2021-11-30      0.4
23  2021-12-31     -0.3
24   2022-1-31      0.4
25   2022-2-28      0.6
26   2022-3-31      0.0
27   2022-4-30      0.4
28   2022-5-31     -0.2

I want to compare the chain ratio value of a month of the current year to the value of the month of the previous year. Assume that the value of the same period last year is y_t-1, and the current value of this year is y_t. I will create a new column according to the following rules:

If y_t = y_t-1, returns 0 for new column;
If y_t ∈ (y_t-1, y_t-1 0.3], returns 1;
If y_t ∈ (y_t-1 0.3, y_t-1 0.5], returns 2;
If y_t > (y_t-1 0.5), returns 3;
If y_t ∈ [y_t-1 - 0.3, y_t-1), returns -1;
If y_t ∈ [y_t-1 - 0.5, y_t-1 - 0.3), returns -2;
If y_t < (y_t-1 - 0.5), returns -3

The expected result:

          date  mom_pct  categorial_mom_pct
0    2020-1-31      1.0                 NaN
1    2020-2-29      0.8                 NaN
2    2020-3-31     -1.2                 NaN
3    2020-4-30     -0.9                 NaN
4    2020-5-31     -0.8                 NaN
5    2020-6-30     -0.1                 NaN
6    2020-7-31      0.6                 NaN
7    2020-8-31      0.4                 NaN
8    2020-9-30      0.2                 NaN
9   2020-10-31     -0.3                 NaN
10  2020-11-30     -0.6                 NaN
11  2020-12-31      0.7                 NaN
12   2021-1-31      1.0                 0.0
13   2021-2-28      0.6                -1.0
14   2021-3-31     -0.5                 3.0
15   2021-4-30     -0.3                 3.0
16   2021-5-31     -0.2                 3.0
17   2021-6-30     -0.4                -1.0
18   2021-7-31      0.3                -1.0
19   2021-8-31      0.1                -1.0
20   2021-9-30      0.0                -1.0
21  2021-10-31      0.7                 3.0
22  2021-11-30      0.4                 3.0
23  2021-12-31     -0.3                -3.0
24   2022-1-31      0.4                -3.0
25   2022-2-28      0.6                 0.0
26   2022-3-31      0.0                 2.0
27   2022-4-30      0.4                 3.0
28   2022-5-31     -0.2                 0.0

I attempt to create multiple columns and ranges, then check mom_pct is in which range. Is it possible to do that in a more effecient way? Thanks.

df1['mom_pct_zero'] = df1['mom_pct'].shift(12)
df1['mom_pct_pos1'] = df1['mom_pct'].shift(12)   0.3
df1['mom_pct_pos2'] = df1['mom_pct'].shift(12)   0.5
df1['mom_pct_neg1'] = df1['mom_pct'].shift(12) - 0.3
df1['mom_pct_neg2'] = df1['mom_pct'].shift(12) - 0.5

CodePudding user response：

I would do it as follows

def categorize(v):
    if np.isnan(v) or v == 0.:
        return v
    sign = -1 if v < 0 else 1
    eps = 1e-10
    if abs(v) <= 0.3   eps:
        return sign * 1
    if abs(v) <= 0.5   eps:
        return sign * 2
    return sign * 3

df['categorial_mom_pct'] = df['mom_pct'].diff(12).map(categorize)
print(df)

Note that I added a very small eps to the threshold to counter the precision issue with floating point arithmetic

abs(-0.3) <= 0.3  # True
abs(-0.4   0.1) <= 0.3  # False
abs(-0.4   0.1) <= 0.3   1e-10  # True

Out:

          date  mom_pct  categorial_mom_pct
0    2020-1-31      1.0                 NaN
1    2020-2-29      0.8                 NaN
2    2020-3-31     -1.2                 NaN
3    2020-4-30     -0.9                 NaN
4    2020-5-31     -0.8                 NaN
5    2020-6-30     -0.1                 NaN
6    2020-7-31      0.6                 NaN
7    2020-8-31      0.4                 NaN
8    2020-9-30      0.2                 NaN
9   2020-10-31     -0.3                 NaN
10  2020-11-30     -0.6                 NaN
11  2020-12-31      0.7                 NaN
12   2021-1-31      1.0                 0.0
13   2021-2-28      0.6                -1.0
14   2021-3-31     -0.5                 3.0
15   2021-4-30     -0.3                 3.0
16   2021-5-31     -0.2                 3.0
17   2021-6-30     -0.4                -1.0
18   2021-7-31      0.3                -1.0
19   2021-8-31      0.1                -1.0
20   2021-9-30      0.0                -1.0
21  2021-10-31      0.7                 3.0
22  2021-11-30      0.4                 3.0
23  2021-12-31     -0.3                -3.0
24   2022-1-31      0.4                -3.0
25   2022-2-28      0.6                 0.0
26   2022-3-31      0.0                 2.0
27   2022-4-30      0.4                 3.0
28   2022-5-31     -0.2                 0.0