Home > front end >  Calculating 15mins data from Cumulative Sums
Calculating 15mins data from Cumulative Sums

Time:10-17

I came up with a problem. I have real-time production data with Machine Codes [M] column, which has DateTime stamps [DateTime] with cumulative sums [Cumulative] column. From the original data, I created 15mins time intervals by flooring the DateTime column.

To problem is, I'd like to take latest CumulativeSum of the each machine for each TimeBins, and take the difference of each 15mins. time intervals. So, I will be able to calculate the 15mins. real-time production data of each machine with the new data frame, with a new column, named [Diff].

Please find the sample code for to represent my problem.

Original Data:

df=pd.DataFrame({'M':['18','18','18','19','19','19','18','18','18','19','19','19','19'],
             'Cumulative':['8','10','11','5','8','9','13','16','17','14','19','20','22'],
             'DateTime': ['2022-08-01 07:14:28','2022-08-01 07:25:58','2022-08-01 07:29:19',
                         '2022-08-01 07:13:17','2022-08-01 07:28:58','2022-08-01 07:29:01',
                         '2022-08-01 07:34:54','2022-08-01 07:36:02','2022-08-01 07:38:17',
                         '2022-08-01 07:33:46','2022-08-01 07:37:09','2022-08-01 07:38:17','2022-08-01 07:41:38']})

I created 15mins. TimeBins by floor the DateTime to "15T" buckets. And change the formats of the each column.

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')

The new dataframe that I'd like to have is as follows,

pd.DataFrame({'M':['18','18','18',
                   '19','19','19'],
             'DateTime':['2022-08-01 07:14:28 ','2022-08-01 07:29:19','2022-08-01 07:38:17',
                        '2022-08-01 07:13:17','2022-08-01 07:29:01','2022-08-01 07:41:38'],
             'TimeBins':['2022-08-01 07:00:00','2022-08-01 07:15:00 ','2022-08-01 07:30:00',
                        '2022-08-01 07:00:00','2022-08-01 07:15:00','2022-08-01 07:30:00'],
             'Cumulative':['8','11','17',
                          '5','9','22'],
             'Diff':['NaN','3','6',
                    'NaN','4','11']})

    M   DateTime                TimeBins        Cumulative       Diff
0   18  2022-08-01 07:14:28 2022-08-01 07:00:00        8        NaN
1   18  2022-08-01 07:29:19 2022-08-01 07:15:00       11        3
2   18  2022-08-01 07:38:17 2022-08-01 07:30:00       17        6
3   19  2022-08-01 07:13:17 2022-08-01 07:00:00        5        NaN
4   19  2022-08-01 07:29:01 2022-08-01 07:15:00        9        4
5   19  2022-08-01 07:41:38 2022-08-01 07:30:00       22        11

CodePudding user response:

Use GroupBy.last and DataFrameGroupBy.diff:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')

cols = ['M','DateTime','TimeBins','Cumulative']
df = df.groupby(['M','TimeBins'], as_index=False).last()[cols]
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
    M            DateTime            TimeBins  Cumulative  Diff
0  18 2022-08-01 07:14:28 2022-08-01 07:00:00           8   NaN
1  18 2022-08-01 07:29:19 2022-08-01 07:15:00          11   3.0
2  18 2022-08-01 07:38:17 2022-08-01 07:30:00          17   6.0
3  19 2022-08-01 07:13:17 2022-08-01 07:00:00           5   NaN
4  19 2022-08-01 07:29:01 2022-08-01 07:15:00           9   4.0
5  19 2022-08-01 07:41:38 2022-08-01 07:30:00          22  13.0

Another solution with Grouper:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Cumulative'] = df['Cumulative'].astype('int32')


cols = ['M','DateTime','TimeBins','Cumulative']

df = (df.groupby(['M',pd.Grouper(freq='15T', key='DateTime')])[['DateTime','Cumulative']]
       .last().rename_axis(['M','TimeBins'])
       .reset_index()[cols])
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
    M            DateTime            TimeBins  Cumulative  Diff
0  18 2022-08-01 07:14:28 2022-08-01 07:00:00           8   NaN
1  18 2022-08-01 07:29:19 2022-08-01 07:15:00          11   3.0
2  18 2022-08-01 07:38:17 2022-08-01 07:30:00          17   6.0
3  19 2022-08-01 07:13:17 2022-08-01 07:00:00           5   NaN
4  19 2022-08-01 07:29:01 2022-08-01 07:15:00           9   4.0
5  19 2022-08-01 07:41:38 2022-08-01 07:30:00          22  13.0
  • Related