I have a dataframe that looks like that (Date is the index):
Date Value Sensor
19/08/2021 8787 A
20/08/2021 7360 A
23/08/2021 17570 A
24/08/2021 18993 A
25/08/2021 17947 A
26/08/2021 18317 A
27/08/2021 16940 A
30/08/2021 23837 A
31/08/2021 18353 A
19/08/2021 15010 B
20/08/2021 10873 B
23/08/2021 5270 B
24/08/2021 21060 B
25/08/2021 13430 B
26/08/2021 6267 B
27/08/2021 33677 B
30/08/2021 9460 B
31/08/2021 8223 B
I need to calculate a new column that would be the current value of the value column divided by the value 5 days ago, plus the value of 5 days ago divided by the value 15 days ago. This has to be done for each sensor. Values which cannot be calculated can be filled with NA or anything meaningless. This is what I would like to get as reslt:
Date Value Sensor Calc
2/08/2021 8247 A NA
3/08/2021 20267 A NA
4/08/2021 10507 A NA
5/08/2021 10397 A NA
6/08/2021 15513 A NA
9/08/2021 10063 A NA
10/08/2021 7787 A NA
11/08/2021 7607 A NA
12/08/2021 12823 A NA
13/08/2021 10410 A NA
16/08/2021 12810 A NA
17/08/2021 22507 A NA
18/08/2021 15050 A NA
19/08/2021 8787 A NA
20/08/2021 7360 A NA
23/08/2021 17570 A 2.924876806
24/08/2021 18993 A 1.954395294
25/08/2021 17947 A 2.624870109
26/08/2021 18317 A 2.92970437
27/08/2021 16940 A 2.776071226
30/08/2021 23837 A 3.102687734
31/08/2021 18353 A 3.405368483
2/08/2021 18170 B NA
3/08/2021 11347 B NA
4/08/2021 10657 B NA
5/08/2021 18710 B NA
6/08/2021 8707 B NA
9/08/2021 8490 B NA
10/08/2021 11043 B NA
11/08/2021 10953 B NA
12/08/2021 9663 B NA
13/08/2021 20227 B NA
16/08/2021 14010 B NA
17/08/2021 15863 B NA
18/08/2021 11583 B NA
19/08/2021 15010 B NA
20/08/2021 10873 B NA
23/08/2021 5270 B 1.147211069
24/08/2021 21060 B 2.725608385
25/08/2021 13430 B 2.246349071
26/08/2021 6267 B 1.219766441
27/08/2021 33677 B 4.346070613
30/08/2021 9460 B 2.415796685
31/08/2021 8223 B 2.297546305
Within one sensor, I can do it with something like
df['Calc'] = (df.iloc[:, 1].div(df.iloc[:, 1].shift(5)) df.iloc[:, 1].shift(15).div(df.iloc[:, 1].shift(15))
but I am struggling to do it within the group.
Bonus question :-) Once the calculation is done I need to transpose the dataframe and use sensor as index and dates as column headers with "Calc"s as horizontal values.
Appreciate your help!
CodePudding user response:
You can use DataFrameGroupBy.shift
, for avoid repeating is assign groupby object to variable g
, but first add missing datetimes per groups:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index('Date').groupby('Sensor')['Value'].resample('D').first().reset_index()
print (df)
Sensor Date Value
0 A 2021-08-19 8787.0
1 A 2021-08-20 7360.0
2 A 2021-08-21 NaN
3 A 2021-08-22 NaN
4 A 2021-08-23 17570.0
5 A 2021-08-24 18993.0
6 A 2021-08-25 17947.0
7 A 2021-08-26 18317.0
8 A 2021-08-27 16940.0
9 A 2021-08-28 NaN
10 A 2021-08-29 NaN
11 A 2021-08-30 23837.0
12 A 2021-08-31 18353.0
13 B 2021-08-19 15010.0
14 B 2021-08-20 10873.0
15 B 2021-08-21 NaN
16 B 2021-08-22 NaN
17 B 2021-08-23 5270.0
18 B 2021-08-24 21060.0
19 B 2021-08-25 13430.0
20 B 2021-08-26 6267.0
21 B 2021-08-27 33677.0
22 B 2021-08-28 NaN
23 B 2021-08-29 NaN
24 B 2021-08-30 9460.0
25 B 2021-08-31 8223.0
g = df.groupby('Sensor')['Value']
df['Calc'] = (df['Value'].div(g.shift(5))) g.shift(5).div(g.shift(15))
Last need pivoting:
df1 = df.pivot('Sensor','Date','Calc')
CodePudding user response:
You can use groupby.apply
and shift
with a period of 5/15 'D' (days) after setting the Date as index:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
def compute(s):
return (s.div(s.shift(5, freq='D'))
s.shift(5, freq='D').div(s.shift(15, freq='D'))
)
df.set_index('Date').groupby('Sensor')['Value'].apply(compute).reset_index()
Output: unfortunately, your example gives only NaNs, here is the output for only the division by the last 5 days
Sensor Date Value
0 A 2021-08-19 NaN
1 A 2021-08-20 NaN
2 A 2021-08-23 NaN
3 A 2021-08-24 2.161489
4 A 2021-08-25 2.438451
5 A 2021-08-26 NaN
6 A 2021-08-27 NaN
7 A 2021-08-28 NaN
8 A 2021-08-29 NaN
9 A 2021-08-30 1.328189
10 A 2021-08-31 1.001965
11 A 2021-09-01 NaN
12 A 2021-09-04 NaN
13 A 2021-09-05 NaN
14 B 2021-08-19 NaN
15 B 2021-08-20 NaN
16 B 2021-08-23 NaN
17 B 2021-08-24 1.403065
18 B 2021-08-25 1.235170
19 B 2021-08-26 NaN
20 B 2021-08-27 NaN
21 B 2021-08-28 NaN
22 B 2021-08-29 NaN
23 B 2021-08-30 0.704393
24 B 2021-08-31 1.312111
25 B 2021-09-01 NaN
26 B 2021-09-04 NaN
27 B 2021-09-05 NaN
For a wide format:
df.set_index('Date').groupby('Sensor')['Value'].apply(compute).unstack('Date')
output:
Date 2021-08-19 2021-08-20 2021-08-23 2021-08-24 2021-08-25 \
Sensor
A NaN NaN NaN 2.161489 2.438451
B NaN NaN NaN 1.403065 1.235170
Date 2021-08-26 2021-08-27 2021-08-28 2021-08-29 2021-08-30 \
Sensor
A NaN NaN NaN NaN 1.328189
B NaN NaN NaN NaN 0.704393
Date 2021-08-31 2021-09-01 2021-09-04 2021-09-05
Sensor
A 1.001965 NaN NaN NaN
B 1.312111 NaN NaN NaN