I came up with a problem. I have real-time production data with Machine Codes [M] column, which has DateTime stamps [DateTime] with cumulative sums [Cumulative] column. From the original data, I created 15mins time intervals by flooring the DateTime column.
To problem is, I'd like to take latest CumulativeSum of the each machine for each TimeBins, and take the difference of each 15mins. time intervals. So, I will be able to calculate the 15mins. real-time production data of each machine with the new data frame, with a new column, named [Diff].
Please find the sample code for to represent my problem.
Original Data:
df=pd.DataFrame({'M':['18','18','18','19','19','19','18','18','18','19','19','19','19'],
'Cumulative':['8','10','11','5','8','9','13','16','17','14','19','20','22'],
'DateTime': ['2022-08-01 07:14:28','2022-08-01 07:25:58','2022-08-01 07:29:19',
'2022-08-01 07:13:17','2022-08-01 07:28:58','2022-08-01 07:29:01',
'2022-08-01 07:34:54','2022-08-01 07:36:02','2022-08-01 07:38:17',
'2022-08-01 07:33:46','2022-08-01 07:37:09','2022-08-01 07:38:17','2022-08-01 07:41:38']})
I created 15mins. TimeBins by floor the DateTime to "15T" buckets. And change the formats of the each column.
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')
The new dataframe that I'd like to have is as follows,
pd.DataFrame({'M':['18','18','18',
'19','19','19'],
'DateTime':['2022-08-01 07:14:28 ','2022-08-01 07:29:19','2022-08-01 07:38:17',
'2022-08-01 07:13:17','2022-08-01 07:29:01','2022-08-01 07:41:38'],
'TimeBins':['2022-08-01 07:00:00','2022-08-01 07:15:00 ','2022-08-01 07:30:00',
'2022-08-01 07:00:00','2022-08-01 07:15:00','2022-08-01 07:30:00'],
'Cumulative':['8','11','17',
'5','9','22'],
'Diff':['NaN','3','6',
'NaN','4','11']})
M DateTime TimeBins Cumulative Diff
0 18 2022-08-01 07:14:28 2022-08-01 07:00:00 8 NaN
1 18 2022-08-01 07:29:19 2022-08-01 07:15:00 11 3
2 18 2022-08-01 07:38:17 2022-08-01 07:30:00 17 6
3 19 2022-08-01 07:13:17 2022-08-01 07:00:00 5 NaN
4 19 2022-08-01 07:29:01 2022-08-01 07:15:00 9 4
5 19 2022-08-01 07:41:38 2022-08-01 07:30:00 22 11
CodePudding user response:
Use GroupBy.last
and DataFrameGroupBy.diff
:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['TimeBins'] = df['DateTime'].dt.floor(freq='15T')
df['Cumulative'] = df['Cumulative'].astype('int32')
cols = ['M','DateTime','TimeBins','Cumulative']
df = df.groupby(['M','TimeBins'], as_index=False).last()[cols]
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
M DateTime TimeBins Cumulative Diff
0 18 2022-08-01 07:14:28 2022-08-01 07:00:00 8 NaN
1 18 2022-08-01 07:29:19 2022-08-01 07:15:00 11 3.0
2 18 2022-08-01 07:38:17 2022-08-01 07:30:00 17 6.0
3 19 2022-08-01 07:13:17 2022-08-01 07:00:00 5 NaN
4 19 2022-08-01 07:29:01 2022-08-01 07:15:00 9 4.0
5 19 2022-08-01 07:41:38 2022-08-01 07:30:00 22 13.0
Another solution with Grouper
:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Cumulative'] = df['Cumulative'].astype('int32')
cols = ['M','DateTime','TimeBins','Cumulative']
df = (df.groupby(['M',pd.Grouper(freq='15T', key='DateTime')])[['DateTime','Cumulative']]
.last().rename_axis(['M','TimeBins'])
.reset_index()[cols])
df['Diff'] = df.groupby('M')['Cumulative'].diff()
print (df)
M DateTime TimeBins Cumulative Diff
0 18 2022-08-01 07:14:28 2022-08-01 07:00:00 8 NaN
1 18 2022-08-01 07:29:19 2022-08-01 07:15:00 11 3.0
2 18 2022-08-01 07:38:17 2022-08-01 07:30:00 17 6.0
3 19 2022-08-01 07:13:17 2022-08-01 07:00:00 5 NaN
4 19 2022-08-01 07:29:01 2022-08-01 07:15:00 9 4.0
5 19 2022-08-01 07:41:38 2022-08-01 07:30:00 22 13.0