Home > Mobile >  How do I filter a dataframe by a datetime64 column in pandas using minutes and seconds?
How do I filter a dataframe by a datetime64 column in pandas using minutes and seconds?

Time:09-27

I've converted the time column with type 'str' to datetime64:

try:
    heart_rate_seconds['time'] = pd.to_datetime(heart_rate_seconds['time'], format = "%m/%d/%Y %I:%M:%S %p")
except:
    heart_rate_seconds['time'] = pd.to_datetime(heart_rate_seconds['time'], format = "%Y/%m/%d %H:%M:%S")

How do filter the dataframe so that it only contains rows where the time column has minutes and seconds equal to zero?

This is what I've tried:


type(heart_rate_seconds['time'][0])

This displays pandas._libs.tslibs.timestamps.Timestamp


test = heart_rate_seconds['time'][1].second
test

This works


heart_rate_hourly = heart_rate_seconds.loc[heart_rate_seconds['time'].dt.second() == 0 
and heart_rate_seconds['time'].dt.minute() == 0]

This gets an error: TypeError: 'Series' object is not callable


heart_rate_hourly = heart_rate_seconds[heart_rate_seconds['time'].dt.strftime("%M") == 0 
and heart_rate_seconds['time'].dt.strftime("%S") == 0]

This gets an error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


on_the_hour = []
for time in heart_rate_seconds['time']:
    if time.second == 0 and time.minute == 0:
        on_the_hour.append(time)
on_the_hour[:5]

This works


on_the_hour = []
for row in heart_rate_seconds:
    time = row[1]
    if time.second == 0 and time.minute == 0:
        on_the_hour.append(time)
on_the_hour[:5]

This doesn't work - AttributeError: 'str' object has no attribute 'second'

Thanks

CodePudding user response:

you can use the dt accessor to create masks for certain properties. Ex:

import pandas as pd

df = pd.DataFrame({'time': ['01/03/2021 03:30:00 AM', '2021/02/04 13:00:00'],
                   'values': [0, 1]})

# note that pandas automatically infers the format correctly:
df['time'] = pd.to_datetime(df['time'])

# a mask where min and sec are zero:
m = (df['time'].dt.minute == 0) & (df['time'].dt.second == 0)

print(df[m])
#                  time  values
# 1 2021-02-04 13:00:00       1

Note that you do not have to assign the conditionals to a variable to be able to use them - and of course you can also use them within loc.

  • Related