I have a dataframe like this:
df = pd.DataFrame({"DateTime":["2020-04-02 06:06:22",
"2020-04-02 06:12:22",
"2020-04-02 06:14:39",
"2020-04-02 06:16:56",
"2020-04-02 06:20:34",
"2020-04-02 06:35:44"],
"Data":[23, 31, 10, 23, 56, 81]})
# column DateTime type must be datetime64[ns]
df["DateTime"] = df["DateTime"].astype("datetime64[ns]")
df
Out[4]:
DateTime Data
0 2020-04-02 06:06:22 23
1 2020-04-02 06:12:22 31
2 2020-04-02 06:14:39 10
3 2020-04-02 06:16:56 23
4 2020-04-02 06:20:34 56
5 2020-04-02 06:35:44 81
I would like to select rows after every 10 min. So my dataframe should be like:
DateTime Data
0 2020-04-02 06:06:22 23
3 2020-04-02 06:16:56 23
5 2020-04-02 06:35:44 81
This solution How to drop rows based on datetime (every 15 min)? drops rows every 15 min but always looking at the exactly row below, so it deletes rows that I don't want. And actually I would like to select rows after a specific time range.
Anyone could help me?
CodePudding user response:
You could count the time elapsed from your first row and then divide this elapsed time into 10 minute buckets using an integer division -
df['time_elapsed'] = (df['DateTime'] - df['DateTime'].shift(1)).fillna(pd.Timedelta(seconds=0)).cumsum()
df['ten_min_bucket'] = pd.to_datetime(df['time_elapsed']).dt.minute // 10
df.groupby('ten_min_bucket')[['DateTime', 'Data']].agg('first')
Output
DateTime Data
ten_min_bucket
0 2020-04-02 06:06:22 23
1 2020-04-02 06:16:56 23
2 2020-04-02 06:35:44 81
CodePudding user response:
This looks like a job for merge_asof
:
# set up indexer DataFrame
df2 = pd.DataFrame({'idx': pd.date_range(df['DateTime'].min(),
df['DateTime'].max(),
freq='10min')
})
# get first value for each slice of 10 minutes
out = (pd.merge_asof(df2, df, left_on='idx', right_on='DateTime',
direction='forward')
#.drop(columns='idx') # uncomment to remove idx
)
output:
idx DateTime Data
0 2020-04-02 06:06:22 2020-04-02 06:06:22 23
1 2020-04-02 06:16:22 2020-04-02 06:16:56 23
2 2020-04-02 06:26:22 2020-04-02 06:35:44 81
output with .drop(columns='idx')
:
DateTime Data
0 2020-04-02 06:06:22 23
1 2020-04-02 06:16:56 23
2 2020-04-02 06:35:44 81