Have a df like so:
weight timeblock
1 1620975600
1 1620975640
1 1620975700
1 1620975715
1 1620975740
1 1620975790
1 1620975800
and I'd like to sum the weight column based on 45-second intervals of the timeblock interval so that it looks like:
weight interval
2 1620975600
2 1620975690
1 1620975735
2 1620975780
where the interval column represents the 45 second time interval beginning at that time.
How would I do this and skip the 45-second intervals that timeblock doesn't exist in?
CodePudding user response:
You could use pd.cut
to create bins in 45 second increments, then groupby the bin and get the size and drop zero counts.
import pandas as pd
df = pd.DataFrame({'weight': [1, 1, 1, 1, 1, 1, 1],
'timeblock': [1620975600,
1620975640,
1620975700,
1620975715,
1620975740,
1620975790,
1620975800]})
df['interval'] = pd.cut(df.timeblock,
bins=[x for x in range(df.timeblock.min(), df.timeblock.max() 45, 45)],
labels=[x for x in range(df.timeblock.min(), df.timeblock.max(), 45)],
include_lowest=True)
df = df.groupby('interval').size().reset_index(name='weight')
df.loc[df['weight']>0]
Output
interval weight
0 1620975600 2
2 1620975690 2
3 1620975735 1
4 1620975780 2
CodePudding user response:
Not sure why they deleted their comment but this seems to work:
df = df.groupby(df.timeblock // 45).weight.sum().reset_index()
df['timeblock'] = df['timeblock'].apply(lambda x: x*45)