Home > Enterprise >  Selecting only the rows where dataframe times that where hourly ends in HH:00:00, with 00:00 being m
Selecting only the rows where dataframe times that where hourly ends in HH:00:00, with 00:00 being m

Time:08-22

I have a lot of data from a lot of different datasets with different time frames (hourly, every 5 minutes, and every minute). I decided to get all of the data on even times, and only want the data ending in YYYY:MM:DD HH:00:00 (I have decades of data on this).

I have tried a few different methods to filter out only the data I want:

df.loc[starting_row_value::value_to_skip_by] but unfortunately there is some missing data so I start off with the HH:00:00, but by the end in a few different frames it ends up being HH:00:05 or HH:00:55, so missing data is messing this solution up

I also tried df[df.time_column[-5:] == 00:00], but that gives me:

TypeError: cannot do slice indexing on RangeIndex with these indexers

with a few false values Name: time, dtype: bool] of type Series

I've done a lot of looking, and couldn't find anything for filter by specific hours. Does anyone have any ideas on what I could do? Any help would be much appreciated!

Edit: dtypes for reach dataframe are as follows:

DATE (MM/DD/YYYY)                object
MST                              object
Global PSP [W/m^2]              float64
Direct NIP [W/m^2]              float64
Reflected PSP [W/m^2]           float64
time                     datetime64[ns]
dtype: object

Everything but the time column was kept as is, whereas I used the following code to create the dataframe columns

df['time'] = pd.to_datetime(df['DATE (MM/DD/YYYY)']   ' '  df['MST']

CodePudding user response:

Assuming you are working with pd.Timestamp values, you could do the following:

import pandas as pd

df = pd.DataFrame([
    pd.to_datetime('2022-04-06 11:00:00'),
    pd.to_datetime('2022-04-06 11:00:05')
], columns=['time_column'])

idx1 = df['time_column'].dt.minute == 0
idx2 = df['time_column'].dt.second == 0
df2 = df[idx1 & idx2]
print(df2)

prints

index time_column
0 2022-04-06 11:00:00
  • Related