Home > Enterprise >  Dataframe column rule
Dataframe column rule

Time:04-21

Need some help here: After creating the 'prev' column I'd like to set NaN for every first change of column 'cat'. After that I need to create a column 'ln_return' with some conditions:

1.If prev=NaN then NaN

2.If (prev<>0 and act/prev>0) then ln(act/prev)

3.If (prev=0 and act=0) then 0

4.If prev=0 then ln(act/0.000001)

5.else ln(0.000001)

import pandas as pd

data = {'cat': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
        'date': ['2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',\
                '2022-01-09', '2022-01-10', '2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', \
                '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10','2021-12-30', '2022-01-02', \
                '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10'],
        'act': [1.0000, 0.7981, 0.7785, 0.3563, 0.1916, 0.0000, 0.0000, 0.0233, 1.0000, 0.5625, 0.5774, \
                0.6777, 0.7300, 0.1951, 0.1966, 0.6413, 1.0000, 0.7905, 0.7867, 0.000, 0.8769, 0.4683, 0.7122, 0.7183]
    }

df =pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['prev'] = df['act'].shift(1)
print(df)

out:                                        Expected:
    cat       date     act    prev        cat       date        act    prev ln_return
0     1 2021-12-30  1.0000     NaN          1 2021-12-30     1.0000     NaN       NaN
1     1 2022-01-02  0.7981  1.0000          1 2022-01-02     0.7981  1,0000   -0.2255
2     1 2022-01-03  0.7785  0.7981          1 2022-01-03     0.7785  0,7981   -0.0249
3     1 2022-01-04  0.3563  0.7785          1 2022-01-04     0.3563  0,7785   -0.7816
4     1 2022-01-05  0.1916  0.3563          1 2022-01-05     0.1916  0,3563   -0.6204
5     1 2022-01-06  0.0000  0.1916          1 2022-01-06     0.0000  0,1916  -13.8155
6     1 2022-01-09  0.0000  0.0000          1 2022-01-09     0.0000  0,0000    0.0000
7     1 2022-01-10  0.0233  0.0000          1 2022-01-10     0.0233  0,0000    7.7536
8     2 2021-12-30  1.0000  0.0233          2 2021-12-30     1.0000     NaN       NaN
9     2 2022-01-02  0.5625  1.0000          2 2022-01-02     0.5625  1,0000   -0.5754
10    2 2022-01-03  0.5774  0.5625          2 2022-01-03     0.5774  0,5625    0.0261
11    2 2022-01-04  0.6777  0.5774          2 2022-01-04     0.6777  0,5774    0.1602
12    2 2022-01-05  0.7300  0.6777          2 2022-01-05     0.7300  0,6777    0.0743
13    2 2022-01-06  0.1951  0.7300          2 2022-01-06     0.1951  0,7300   -1.3195
14    2 2022-01-09  0.1966  0.1951          2 2022-01-09     0.1966  0,1951    0.0077
15    2 2022-01-10  0.6413  0.1966          2 2022-01-10     0.6413  0,1966    1.1823
16    3 2021-12-30  1.0000  0.6413          3 2021-12-30     1.0000     NaN      NaN
17    3 2022-01-02  0.7905  1.0000          3 2022-01-02     0.7905  1,0000   -0.2351
18    3 2022-01-03  0.7867  0.7905          3 2022-01-03     0.7867  0,7905   -0.0048
19    3 2022-01-04  0.0000  0.7867          3 2022-01-04     0.0000  0,7867  -13.8155
20    3 2022-01-05  0.8769  0.0000          3 2022-01-05     0.8769  0,0000   11.3816
21    3 2022-01-06  0.4683  0.8769          3 2022-01-06     0.4683  0,8769   -0.6273
22    3 2022-01-09  0.7122  0.4683          3 2022-01-09     0.7122  0,4683    0.4192
23    3 2022-01-10  0.7183  0.7122          3 2022-01-10     0.7183  0,7122    0.0085

CodePudding user response:

IIUC, you could do:

# perform a shift per group
df['prev'] = df.groupby('cat')['act'].shift(1)

# compute all conditions but #3
df['ln_return'] = np.log(df['act']/df['prev'].clip(0.000001)).clip(np.log(0.000001))

# condition #3 (both 0 -> 0)
df['ln_return'] = df['ln_return'].mask(df['prev'].eq(0)&df['act'].eq(0), 0)

Output:

    cat       date     act    prev  ln_return
0     1 2021-12-30  1.0000     NaN        NaN
1     1 2022-01-02  0.7981  1.0000  -0.225521
2     1 2022-01-03  0.7785  0.7981  -0.024865
3     1 2022-01-04  0.3563  0.7785  -0.781596
4     1 2022-01-05  0.1916  0.3563  -0.620363
5     1 2022-01-06  0.0000  0.1916 -13.815511
6     1 2022-01-09  0.0000  0.0000   0.000000
7     1 2022-01-10  0.0233  0.0000  10.056209
8     2 2021-12-30  1.0000     NaN        NaN
9     2 2022-01-02  0.5625  1.0000  -0.575364
10    2 2022-01-03  0.5774  0.5625   0.026144
11    2 2022-01-04  0.6777  0.5774   0.160169
12    2 2022-01-05  0.7300  0.6777   0.074340
13    2 2022-01-06  0.1951  0.7300  -1.319532
14    2 2022-01-09  0.1966  0.1951   0.007659
15    2 2022-01-10  0.6413  0.1966   1.182326
16    3 2021-12-30  1.0000     NaN        NaN
17    3 2022-01-02  0.7905  1.0000  -0.235090
18    3 2022-01-03  0.7867  0.7905  -0.004819
19    3 2022-01-04  0.0000  0.7867 -13.815511
20    3 2022-01-05  0.8769  0.0000  13.684148
21    3 2022-01-06  0.4683  0.8769  -0.627284
22    3 2022-01-09  0.7122  0.4683   0.419250
23    3 2022-01-10  0.7183  0.7122   0.008529
  • Related