Home > Enterprise >  Groupby dataframe by column value and calculate new column
Groupby dataframe by column value and calculate new column

Time:10-25

I have a dataframe that looks like that (Date is the index):

Date    Value   Sensor
19/08/2021  8787    A
20/08/2021  7360    A
23/08/2021  17570   A
24/08/2021  18993   A
25/08/2021  17947   A
26/08/2021  18317   A
27/08/2021  16940   A
30/08/2021  23837   A
31/08/2021  18353   A
19/08/2021  15010   B
20/08/2021  10873   B
23/08/2021  5270    B
24/08/2021  21060   B
25/08/2021  13430   B
26/08/2021  6267    B
27/08/2021  33677   B
30/08/2021  9460    B
31/08/2021  8223    B

I need to calculate a new column that would be the current value of the value column divided by the value 5 days ago, plus the value of 5 days ago divided by the value 15 days ago. This has to be done for each sensor. Values which cannot be calculated can be filled with NA or anything meaningless. This is what I would like to get as reslt:

Date    Value   Sensor  Calc
2/08/2021   8247    A   NA
3/08/2021   20267   A   NA
4/08/2021   10507   A   NA
5/08/2021   10397   A   NA
6/08/2021   15513   A   NA
9/08/2021   10063   A   NA
10/08/2021  7787    A   NA
11/08/2021  7607    A   NA
12/08/2021  12823   A   NA
13/08/2021  10410   A   NA
16/08/2021  12810   A   NA
17/08/2021  22507   A   NA
18/08/2021  15050   A   NA
19/08/2021  8787    A   NA
20/08/2021  7360    A   NA
23/08/2021  17570   A   2.924876806
24/08/2021  18993   A   1.954395294
25/08/2021  17947   A   2.624870109
26/08/2021  18317   A   2.92970437
27/08/2021  16940   A   2.776071226
30/08/2021  23837   A   3.102687734
31/08/2021  18353   A   3.405368483
2/08/2021   18170   B   NA
3/08/2021   11347   B   NA
4/08/2021   10657   B   NA
5/08/2021   18710   B   NA
6/08/2021   8707    B   NA
9/08/2021   8490    B   NA
10/08/2021  11043   B   NA
11/08/2021  10953   B   NA
12/08/2021  9663    B   NA
13/08/2021  20227   B   NA
16/08/2021  14010   B   NA
17/08/2021  15863   B   NA
18/08/2021  11583   B   NA
19/08/2021  15010   B   NA
20/08/2021  10873   B   NA
23/08/2021  5270    B   1.147211069
24/08/2021  21060   B   2.725608385
25/08/2021  13430   B   2.246349071
26/08/2021  6267    B   1.219766441
27/08/2021  33677   B   4.346070613
30/08/2021  9460    B   2.415796685
31/08/2021  8223    B   2.297546305

Within one sensor, I can do it with something like

df['Calc'] = (df.iloc[:, 1].div(df.iloc[:, 1].shift(5)) df.iloc[:, 1].shift(15).div(df.iloc[:, 1].shift(15))

but I am struggling to do it within the group.

Bonus question :-) Once the calculation is done I need to transpose the dataframe and use sensor as index and dates as column headers with "Calc"s as horizontal values.

Appreciate your help!

CodePudding user response:

You can use DataFrameGroupBy.shift, for avoid repeating is assign groupby object to variable g, but first add missing datetimes per groups:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
df = df.set_index('Date').groupby('Sensor')['Value'].resample('D').first().reset_index()
print (df)
   Sensor       Date    Value
0       A 2021-08-19   8787.0
1       A 2021-08-20   7360.0
2       A 2021-08-21      NaN
3       A 2021-08-22      NaN
4       A 2021-08-23  17570.0
5       A 2021-08-24  18993.0
6       A 2021-08-25  17947.0
7       A 2021-08-26  18317.0
8       A 2021-08-27  16940.0
9       A 2021-08-28      NaN
10      A 2021-08-29      NaN
11      A 2021-08-30  23837.0
12      A 2021-08-31  18353.0
13      B 2021-08-19  15010.0
14      B 2021-08-20  10873.0
15      B 2021-08-21      NaN
16      B 2021-08-22      NaN
17      B 2021-08-23   5270.0
18      B 2021-08-24  21060.0
19      B 2021-08-25  13430.0
20      B 2021-08-26   6267.0
21      B 2021-08-27  33677.0
22      B 2021-08-28      NaN
23      B 2021-08-29      NaN
24      B 2021-08-30   9460.0
25      B 2021-08-31   8223.0

g = df.groupby('Sensor')['Value']
df['Calc'] = (df['Value'].div(g.shift(5))) g.shift(5).div(g.shift(15))

Last need pivoting:

df1 = df.pivot('Sensor','Date','Calc')

CodePudding user response:

You can use groupby.apply and shift with a period of 5/15 'D' (days) after setting the Date as index:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

def compute(s):
    return (s.div(s.shift(5, freq='D'))
             s.shift(5, freq='D').div(s.shift(15, freq='D'))
           )

df.set_index('Date').groupby('Sensor')['Value'].apply(compute).reset_index()

Output: unfortunately, your example gives only NaNs, here is the output for only the division by the last 5 days

   Sensor       Date     Value
0       A 2021-08-19       NaN
1       A 2021-08-20       NaN
2       A 2021-08-23       NaN
3       A 2021-08-24  2.161489
4       A 2021-08-25  2.438451
5       A 2021-08-26       NaN
6       A 2021-08-27       NaN
7       A 2021-08-28       NaN
8       A 2021-08-29       NaN
9       A 2021-08-30  1.328189
10      A 2021-08-31  1.001965
11      A 2021-09-01       NaN
12      A 2021-09-04       NaN
13      A 2021-09-05       NaN
14      B 2021-08-19       NaN
15      B 2021-08-20       NaN
16      B 2021-08-23       NaN
17      B 2021-08-24  1.403065
18      B 2021-08-25  1.235170
19      B 2021-08-26       NaN
20      B 2021-08-27       NaN
21      B 2021-08-28       NaN
22      B 2021-08-29       NaN
23      B 2021-08-30  0.704393
24      B 2021-08-31  1.312111
25      B 2021-09-01       NaN
26      B 2021-09-04       NaN
27      B 2021-09-05       NaN

For a wide format:

df.set_index('Date').groupby('Sensor')['Value'].apply(compute).unstack('Date')

output:

Date    2021-08-19  2021-08-20  2021-08-23  2021-08-24  2021-08-25  \
Sensor                                                               
A              NaN         NaN         NaN    2.161489    2.438451   
B              NaN         NaN         NaN    1.403065    1.235170   

Date    2021-08-26  2021-08-27  2021-08-28  2021-08-29  2021-08-30  \
Sensor                                                               
A              NaN         NaN         NaN         NaN    1.328189   
B              NaN         NaN         NaN         NaN    0.704393   

Date    2021-08-31  2021-09-01  2021-09-04  2021-09-05  
Sensor                                                  
A         1.001965         NaN         NaN         NaN  
B         1.312111         NaN         NaN         NaN  
  • Related