Supposed 3 kids are having a contest to see who can sell the most candies, chocolate bars and cookies over a couple of days. They started their contest that very day at 08:15:00 (8.15am) and agreed to enter their sale into a tracker as shown in the data frame below:
df = pd.DataFrame({
'Name': ['Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Khala', 'Harvey', 'Gaddy'],
'Timestamp': ['2022-01-01 08:17:23.12', '2022-01-01 08:22:58.76', '2022-01-01 08:19:02.57', '2022-01-01 08:55:43.99','2022-01-01 08:41:23.10', '2022-01-01 09:14:59.99', '2022-01-01 09:15:02.02', '2022-01-01 09:44:43.30','2022-01-01 09:54:23.71', '2022-01-01 10:15:00.00', '2022-01-01 10:15:02.99', '2022-01-01 10:19:43.52'],
'Candy': [2, 1, 3, 3, 5, 4, 6, 6, 4, 10, 9, 14],
'Chocolate Bars': [4, np.nan, 6, 7, 8, 6, 7, 13, 10, 19, 11, 11],
'Cookies': [1, 1, 4, 2, 4, 5, 5, 8, 11, 8, 15, 17]
})
Name Timestamp Candy Chocolate Bars Cookies
0 Harvey 2022-01-01 08:17:23.12 2 4 1
1 Khala 2022-01-01 08:22:58.76 1 NaN 1
2 Gaddy 2022-01-01 08:19:02.57 3 6 4
3 Harvey 2022-01-01 08:55:43.99 3 7 2
4 Khala 2022-01-01 08:41:23.10 5 8 4
5 Gaddy 2022-01-01 09:14:59.99 4 6 5
6 Harvey 2022-01-01 09:15:02.02 6 7 5
7 Khala 2022-01-01 09:44:43.30 6 13 8
8 Gaddy 2022-01-01 09:54:23.71 4 10 11
9 Khala 2022-01-01 10:15:00.00 10 19 8
10 Harvey 2022-01-01 10:15:02.99 9 11 15
11 Gaddy 2022-01-01 10:19:43.52 14 11 17
Now the intention is to create a new data frame that captures each child's latest sale in 1-hour intervals (an example of an hour's window would be 08:15:00.00 - 09:14:59.99) and the window they were captured in. Such that the data frame will look like this:
Name Window Timestamp Candy Chocolate Bars Cookies
1 Harvey 09:15:00.00 2022-01-01 08:55:43.99 3 7 2
2 Khala 09:15:00.00 2022-01-01 08:41:23.10 5 8 4
3 Gaddy 09:15:00.00 2022-01-01 09:14:59.99 4 6 5
4 Harvey 10:15:00.00 2022-01-01 09:15:02.02 6 7 5
5 Khala 10:15:00.00 2022-01-01 09:44:43.30 6 13 8
6 Gaddy 10:15:00.00 2022-01-01 09:54:23.71 4 10 11
7 Khala 11:15:00.00 2022-01-01 10:15:00.00 10 19 8
8 Harvey 11:15:00.00 2022-01-01 10:15:02.99 9 11 15
9 Gaddy 11:15:00.00 2022-01-01 10:19:43.52 14 11 17
CodePudding user response:
The first thing I would do would be to convert the timestamp column to datetime to make it more easy to work with
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Name': ['Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Harvey', 'Khala', 'Gaddy', 'Khala', 'Harvey', 'Gaddy'],
'Timestamp': ['2022-01-01 08:17:23.12', '2022-01-01 08:22:58.76', '2022-01-01 08:19:02.57', '2022-01-01 08:55:43.99','2022-01-01 08:41:23.10', '2022-01-01 09:14:59.99', '2022-01-01 09:15:02.02', '2022-01-01 09:44:43.30','2022-01-01 09:54:23.71', '2022-01-01 10:15:00.00', '2022-01-01 10:15:02.99', '2022-01-01 10:19:43.52'],
'Candy': [2, 1, 3, 3, 5, 4, 6, 6, 4, 10, 9, 14],
'Chocolate Bars': [4, np.nan, 6, 7, 8, 6, 7, 13, 10, 19, 11, 11],
'Cookies': [1, 1, 4, 2, 4, 5, 5, 8, 11, 8, 15, 17]
})
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
Then the next step would be to add the window column
# Get window
window_start = pd.to_timedelta("15min")
df["Window"] = (df["Timestamp"] - window_start).dt.floor("1h") window_start
Which you can do by first shifting the times by 15 minutes take only the hours and then add back the 15 minutes. If you don't want to keep the date in the window that is possible as well.
The last step is to sort the timestamps and only keep one per window and person
# Keep only one row per window and person
df = df.sort_values("Timestamp", ascending=False).groupby(["Name", "Window"]).head(1)
df = df.sort_index().reset_index(drop=True)
CodePudding user response:
After casting the Timestamp
column to datetime, you could use DataFrame .groupby method combined with .resample method:
df["Timestamp"] = pd.to_datetime(df["Timestamp")
cols = ['Candy', 'Chocolate Bars', 'Cookies']
(df
.groupby("Name")
.resample("60T", offset="15T", on="Timestamp", label="right")
.last()
.loc[:, cols]
.reset_index()
.sort_values("Timestamp")
)
Name Timestamp Candy Chocolate Bars Cookies
0 Gaddy 2022-01-01 09:15:00 4 6.0 5
3 Harvey 2022-01-01 09:15:00 3 7.0 2
6 Khala 2022-01-01 09:15:00 5 8.0 4
1 Gaddy 2022-01-01 10:15:00 4 10.0 11
4 Harvey 2022-01-01 10:15:00 6 7.0 5
7 Khala 2022-01-01 10:15:00 6 13.0 8
2 Gaddy 2022-01-01 11:15:00 14 11.0 17
5 Harvey 2022-01-01 11:15:00 9 11.0 15
8 Khala 2022-01-01 11:15:00 10 19.0 8