Home > Enterprise >  How to compare just the date or just date time ignoring seconds in a Python Pandas dataframe column
How to compare just the date or just date time ignoring seconds in a Python Pandas dataframe column

Time:05-13

In a pandas dataframe, I have a column of mixed data types, such as text, integers and datetimes. I need to find columns where datetimes match: (1) exact values in some cases, (2) only the date (ignoring time), or (3) only the date and time, but ignoring seconds.

In the following code example with a mixed data type dataframe column, there are three dates of varying imprecision. Mapping the conditions into a separate dataframe works for a precise value.

import pandas as pd
import numpy as np
# example data frame
inp = [{'Id': 0, 'mixCol': np.nan},
       {'Id': 1, 'mixCol': "text"},
       {'Id': 2, 'mixCol': 43831},
       {'Id': 3, 'mixCol': pd.to_datetime("2020-01-01 00:00:00")}, 
       {'Id': 4, 'mixCol': pd.to_datetime("2020-01-01 01:01:00")},
       {'Id': 5, 'mixCol': pd.to_datetime("2020-01-01 01:01:01")}
       ]
df = pd.DataFrame(inp)
print(df.dtypes)

myMap = pd.DataFrame()
myMap["Exact"] = df["mixCol"] == pd.to_datetime("2020-01-01 01:01:01")

0   False
1   False
2   False
3   False
4   False
5   True
6   False

The output I need should be:

Id   Exact    DateOnly    NoSeconds
0    False    False       False
1    False    False       False
2    False    False       False
3    False    True        False
0    False    True        True 
5    True     True        True 
6    False    False       False 

BUT, mapping just the date, without time, maps as if the date had a time of 00:00:00.

myMap["DateOnly"] = df["mixCol"] == pd.to_datetime("2020-01-01")

Id   Exact    DateOnly
0    False    False   
1    False    False  
2    False    False  
3    False    True   
0    False    False  
5    True     False  
6    False    False  

Trying to convert values in the mixed column throws an AttributeError: 'Series' object has not attribute 'date'; and trying to use ">" and "<" to define the relevant range throws a TypeError: '>=' not supported between instances of 'str' and 'Timestamp'

myMap["DateOnly"] = df["mixCol"].date == pd.to_datetime("2020-01-01")
myMap["NoSeconds"] = (df["mixCol"] >= pd.to_datetime("2020-01-01 01:01:00")) & (df["mixCol"] < pd.to_datetime("2020-01-01 01:02:00"))

If I try to follow the solution for mix columns in pandas proposed here, both the np.nan and text value map true as dates.

df["IsDate"] = df.apply(pd.to_datetime, errors='coerce',axis=1).nunique(1).eq(1).map({True:True ,False:False})

I'm not sure how to proceed in this situation?

CodePudding user response:

Use Series.dt.normalize for compare datetimes with remove times (set them to 00:00:00) or with Series.dt.floor by days or minutes for remove seconds:

#convert column to all datetimes with NaT
d = pd.to_datetime(df["mixCol"], errors='coerce')
myMap["DateOnly"] = d.dt.normalize() == pd.to_datetime("2020-01-01")
myMap["DateOnly"] = d.dt.floor('D') == pd.to_datetime("2020-01-01")

#alternative with dates
myMap["DateOnly"] = d.dt.date == pd.to_datetime("2020-01-01").date()

myMap['NoSeconds'] = d.dt.floor('Min') == pd.to_datetime("2020-01-01 01:01:00")

print (myMap)
   Exact  DateOnly  NoSeconds
0  False     False      False
1  False     False      False
2  False     False      False
3  False      True      False
4  False      True       True
5   True      True       True
  • Related