Need some help here: After creating the 'prev' column I'd like to set NaN for every first change of column 'cat'. After that I need to create a column 'ln_return' with some conditions:
1.If prev=NaN then NaN
2.If (prev<>0 and act/prev>0) then ln(act/prev)
3.If (prev=0 and act=0) then 0
4.If prev=0 then ln(act/0.000001)
5.else ln(0.000001)
import pandas as pd
data = {'cat': [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3],
'date': ['2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',\
'2022-01-09', '2022-01-10', '2021-12-30', '2022-01-02', '2022-01-03', '2022-01-04', \
'2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10','2021-12-30', '2022-01-02', \
'2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06', '2022-01-09', '2022-01-10'],
'act': [1.0000, 0.7981, 0.7785, 0.3563, 0.1916, 0.0000, 0.0000, 0.0233, 1.0000, 0.5625, 0.5774, \
0.6777, 0.7300, 0.1951, 0.1966, 0.6413, 1.0000, 0.7905, 0.7867, 0.000, 0.8769, 0.4683, 0.7122, 0.7183]
}
df =pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df['prev'] = df['act'].shift(1)
print(df)
out: Expected:
cat date act prev cat date act prev ln_return
0 1 2021-12-30 1.0000 NaN 1 2021-12-30 1.0000 NaN NaN
1 1 2022-01-02 0.7981 1.0000 1 2022-01-02 0.7981 1,0000 -0.2255
2 1 2022-01-03 0.7785 0.7981 1 2022-01-03 0.7785 0,7981 -0.0249
3 1 2022-01-04 0.3563 0.7785 1 2022-01-04 0.3563 0,7785 -0.7816
4 1 2022-01-05 0.1916 0.3563 1 2022-01-05 0.1916 0,3563 -0.6204
5 1 2022-01-06 0.0000 0.1916 1 2022-01-06 0.0000 0,1916 -13.8155
6 1 2022-01-09 0.0000 0.0000 1 2022-01-09 0.0000 0,0000 0.0000
7 1 2022-01-10 0.0233 0.0000 1 2022-01-10 0.0233 0,0000 7.7536
8 2 2021-12-30 1.0000 0.0233 2 2021-12-30 1.0000 NaN NaN
9 2 2022-01-02 0.5625 1.0000 2 2022-01-02 0.5625 1,0000 -0.5754
10 2 2022-01-03 0.5774 0.5625 2 2022-01-03 0.5774 0,5625 0.0261
11 2 2022-01-04 0.6777 0.5774 2 2022-01-04 0.6777 0,5774 0.1602
12 2 2022-01-05 0.7300 0.6777 2 2022-01-05 0.7300 0,6777 0.0743
13 2 2022-01-06 0.1951 0.7300 2 2022-01-06 0.1951 0,7300 -1.3195
14 2 2022-01-09 0.1966 0.1951 2 2022-01-09 0.1966 0,1951 0.0077
15 2 2022-01-10 0.6413 0.1966 2 2022-01-10 0.6413 0,1966 1.1823
16 3 2021-12-30 1.0000 0.6413 3 2021-12-30 1.0000 NaN NaN
17 3 2022-01-02 0.7905 1.0000 3 2022-01-02 0.7905 1,0000 -0.2351
18 3 2022-01-03 0.7867 0.7905 3 2022-01-03 0.7867 0,7905 -0.0048
19 3 2022-01-04 0.0000 0.7867 3 2022-01-04 0.0000 0,7867 -13.8155
20 3 2022-01-05 0.8769 0.0000 3 2022-01-05 0.8769 0,0000 11.3816
21 3 2022-01-06 0.4683 0.8769 3 2022-01-06 0.4683 0,8769 -0.6273
22 3 2022-01-09 0.7122 0.4683 3 2022-01-09 0.7122 0,4683 0.4192
23 3 2022-01-10 0.7183 0.7122 3 2022-01-10 0.7183 0,7122 0.0085
CodePudding user response:
IIUC, you could do:
# perform a shift per group
df['prev'] = df.groupby('cat')['act'].shift(1)
# compute all conditions but #3
df['ln_return'] = np.log(df['act']/df['prev'].clip(0.000001)).clip(np.log(0.000001))
# condition #3 (both 0 -> 0)
df['ln_return'] = df['ln_return'].mask(df['prev'].eq(0)&df['act'].eq(0), 0)
Output:
cat date act prev ln_return
0 1 2021-12-30 1.0000 NaN NaN
1 1 2022-01-02 0.7981 1.0000 -0.225521
2 1 2022-01-03 0.7785 0.7981 -0.024865
3 1 2022-01-04 0.3563 0.7785 -0.781596
4 1 2022-01-05 0.1916 0.3563 -0.620363
5 1 2022-01-06 0.0000 0.1916 -13.815511
6 1 2022-01-09 0.0000 0.0000 0.000000
7 1 2022-01-10 0.0233 0.0000 10.056209
8 2 2021-12-30 1.0000 NaN NaN
9 2 2022-01-02 0.5625 1.0000 -0.575364
10 2 2022-01-03 0.5774 0.5625 0.026144
11 2 2022-01-04 0.6777 0.5774 0.160169
12 2 2022-01-05 0.7300 0.6777 0.074340
13 2 2022-01-06 0.1951 0.7300 -1.319532
14 2 2022-01-09 0.1966 0.1951 0.007659
15 2 2022-01-10 0.6413 0.1966 1.182326
16 3 2021-12-30 1.0000 NaN NaN
17 3 2022-01-02 0.7905 1.0000 -0.235090
18 3 2022-01-03 0.7867 0.7905 -0.004819
19 3 2022-01-04 0.0000 0.7867 -13.815511
20 3 2022-01-05 0.8769 0.0000 13.684148
21 3 2022-01-06 0.4683 0.8769 -0.627284
22 3 2022-01-09 0.7122 0.4683 0.419250
23 3 2022-01-10 0.7183 0.7122 0.008529