I have a lot of data from a lot of different datasets with different time frames (hourly, every 5 minutes, and every minute). I decided to get all of the data on even times, and only want the data ending in YYYY:MM:DD HH:00:00 (I have decades of data on this).
I have tried a few different methods to filter out only the data I want:
df.loc[starting_row_value::value_to_skip_by] but unfortunately there is some missing data so I start off with the HH:00:00, but by the end in a few different frames it ends up being HH:00:05 or HH:00:55, so missing data is messing this solution up
I also tried df[df.time_column[-5:] == 00:00]
, but that gives me:
TypeError: cannot do slice indexing on RangeIndex with these indexers
with a few false values Name: time, dtype: bool] of type Series
I've done a lot of looking, and couldn't find anything for filter by specific hours. Does anyone have any ideas on what I could do? Any help would be much appreciated!
Edit: dtypes for reach dataframe are as follows:
DATE (MM/DD/YYYY) object
MST object
Global PSP [W/m^2] float64
Direct NIP [W/m^2] float64
Reflected PSP [W/m^2] float64
time datetime64[ns]
dtype: object
Everything but the time column was kept as is, whereas I used the following code to create the dataframe columns
df['time'] = pd.to_datetime(df['DATE (MM/DD/YYYY)'] ' ' df['MST']
CodePudding user response:
Assuming you are working with pd.Timestamp
values, you could do the following:
import pandas as pd
df = pd.DataFrame([
pd.to_datetime('2022-04-06 11:00:00'),
pd.to_datetime('2022-04-06 11:00:05')
], columns=['time_column'])
idx1 = df['time_column'].dt.minute == 0
idx2 = df['time_column'].dt.second == 0
df2 = df[idx1 & idx2]
print(df2)
prints
index | time_column |
---|---|
0 | 2022-04-06 11:00:00 |