Home > OS >  Pandas sum column based on window interval of other column
Pandas sum column based on window interval of other column

Time:10-09

Have a df like so:

weight   timeblock  
1        1620975600   
1        1620975640  
1        1620975700 
1        1620975715 
1        1620975740 
1        1620975790  
1        1620975800

and I'd like to sum the weight column based on 45-second intervals of the timeblock interval so that it looks like:

weight     interval
2          1620975600
2          1620975690
1          1620975735
2          1620975780

where the interval column represents the 45 second time interval beginning at that time.

How would I do this and skip the 45-second intervals that timeblock doesn't exist in?

CodePudding user response:

You could use pd.cut to create bins in 45 second increments, then groupby the bin and get the size and drop zero counts.

import pandas as pd

df = pd.DataFrame({'weight': [1, 1, 1, 1, 1, 1, 1],
 'timeblock': [1620975600,
  1620975640,
  1620975700,
  1620975715,
  1620975740,
  1620975790,
  1620975800]})


df['interval'] = pd.cut(df.timeblock,
                   bins=[x for x in range(df.timeblock.min(), df.timeblock.max() 45, 45)],
                   labels=[x for x in range(df.timeblock.min(), df.timeblock.max(), 45)],
                   include_lowest=True)

df = df.groupby('interval').size().reset_index(name='weight')

df.loc[df['weight']>0]

Output

     interval  weight
0  1620975600       2
2  1620975690       2
3  1620975735       1
4  1620975780       2

CodePudding user response:

Not sure why they deleted their comment but this seems to work:

df = df.groupby(df.timeblock // 45).weight.sum().reset_index()

df['timeblock'] = df['timeblock'].apply(lambda x: x*45)
  • Related