Home > Software engineering >  Comparing one month's value of current year with previous year values adding or substracting mu
Comparing one month's value of current year with previous year values adding or substracting mu

Time:06-22

Given a following dataframe df:

          date  mom_pct
0    2020-1-31      1.4
1    2020-2-29      0.8
2    2020-3-31     -1.2
3    2020-4-30     -0.9
4    2020-5-31     -0.8
5    2020-6-30     -0.1
6    2020-7-31      0.6
7    2020-8-31      0.4
8    2020-9-30      0.2
9   2020-10-31     -0.3
10  2020-11-30     -0.6
11  2020-12-31      0.7
12   2021-1-31      1.0
13   2021-2-28      0.6
14   2021-3-31     -0.5
15   2021-4-30     -0.3
16   2021-5-31     -0.2
17   2021-6-30     -0.4
18   2021-7-31      0.3
19   2021-8-31      0.1
20   2021-9-30      0.0
21  2021-10-31      0.7
22  2021-11-30      0.4
23  2021-12-31     -0.3
24   2022-1-31      0.4
25   2022-2-28      0.6
26   2022-3-31      0.0
27   2022-4-30      0.4
28   2022-5-31     -0.2

I want to compare the chain ratio value of a month of the current year to the value of the month of the previous year. Assume that the value of the same period last year is y_t-1, and the current value of this year is y_t. I will create a new column according to the following rules:

  • If y_t = y_t-1, returns 0 for new column;
  • If y_t ∈ (y_t-1, y_t-1 0.3], returns 1;
  • If y_t ∈ (y_t-1 0.3, y_t-1 0.5], returns 2;
  • If y_t > (y_t-1 0.5), returns 3;
  • If y_t ∈ [y_t-1 - 0.3, y_t-1), returns -1;
  • If y_t ∈ [y_t-1 - 0.5, y_t-1 - 0.3), returns -2;
  • If y_t < (y_t-1 - 0.5), returns -3

The expected result:

          date  mom_pct  categorial_mom_pct
0    2020-1-31      1.0                 NaN
1    2020-2-29      0.8                 NaN
2    2020-3-31     -1.2                 NaN
3    2020-4-30     -0.9                 NaN
4    2020-5-31     -0.8                 NaN
5    2020-6-30     -0.1                 NaN
6    2020-7-31      0.6                 NaN
7    2020-8-31      0.4                 NaN
8    2020-9-30      0.2                 NaN
9   2020-10-31     -0.3                 NaN
10  2020-11-30     -0.6                 NaN
11  2020-12-31      0.7                 NaN
12   2021-1-31      1.0                 0.0
13   2021-2-28      0.6                -1.0
14   2021-3-31     -0.5                 3.0
15   2021-4-30     -0.3                 3.0
16   2021-5-31     -0.2                 3.0
17   2021-6-30     -0.4                -1.0
18   2021-7-31      0.3                -1.0
19   2021-8-31      0.1                -1.0
20   2021-9-30      0.0                -1.0
21  2021-10-31      0.7                 3.0
22  2021-11-30      0.4                 3.0
23  2021-12-31     -0.3                -3.0
24   2022-1-31      0.4                -3.0
25   2022-2-28      0.6                 0.0
26   2022-3-31      0.0                 2.0
27   2022-4-30      0.4                 3.0
28   2022-5-31     -0.2                 0.0

I attempt to create multiple columns and ranges, then check mom_pct is in which range. Is it possible to do that in a more effecient way? Thanks.

df1['mom_pct_zero'] = df1['mom_pct'].shift(12)
df1['mom_pct_pos1'] = df1['mom_pct'].shift(12)   0.3
df1['mom_pct_pos2'] = df1['mom_pct'].shift(12)   0.5
df1['mom_pct_neg1'] = df1['mom_pct'].shift(12) - 0.3
df1['mom_pct_neg2'] = df1['mom_pct'].shift(12) - 0.5

CodePudding user response:

I would do it as follows

def categorize(v):
    if np.isnan(v) or v == 0.:
        return v
    sign = -1 if v < 0 else 1
    eps = 1e-10
    if abs(v) <= 0.3   eps:
        return sign * 1
    if abs(v) <= 0.5   eps:
        return sign * 2
    return sign * 3

df['categorial_mom_pct'] = df['mom_pct'].diff(12).map(categorize)
print(df)

Note that I added a very small eps to the threshold to counter the precision issue with floating point arithmetic

abs(-0.3) <= 0.3  # True
abs(-0.4   0.1) <= 0.3  # False
abs(-0.4   0.1) <= 0.3   1e-10  # True

Out:

          date  mom_pct  categorial_mom_pct
0    2020-1-31      1.0                 NaN
1    2020-2-29      0.8                 NaN
2    2020-3-31     -1.2                 NaN
3    2020-4-30     -0.9                 NaN
4    2020-5-31     -0.8                 NaN
5    2020-6-30     -0.1                 NaN
6    2020-7-31      0.6                 NaN
7    2020-8-31      0.4                 NaN
8    2020-9-30      0.2                 NaN
9   2020-10-31     -0.3                 NaN
10  2020-11-30     -0.6                 NaN
11  2020-12-31      0.7                 NaN
12   2021-1-31      1.0                 0.0
13   2021-2-28      0.6                -1.0
14   2021-3-31     -0.5                 3.0
15   2021-4-30     -0.3                 3.0
16   2021-5-31     -0.2                 3.0
17   2021-6-30     -0.4                -1.0
18   2021-7-31      0.3                -1.0
19   2021-8-31      0.1                -1.0
20   2021-9-30      0.0                -1.0
21  2021-10-31      0.7                 3.0
22  2021-11-30      0.4                 3.0
23  2021-12-31     -0.3                -3.0
24   2022-1-31      0.4                -3.0
25   2022-2-28      0.6                 0.0
26   2022-3-31      0.0                 2.0
27   2022-4-30      0.4                 3.0
28   2022-5-31     -0.2                 0.0
  • Related