I currently use pd.read_csv to read the dataframe. It does not detect both date and datetime columns, but instead checks it out as object. So, I use the below snippet to find date/datetime columns:
df = pd.read_csv(filename)
collist = []
for col in df.columns:
if df[col].dtype == 'object':
try:
df[col] = pd.to_datetime(df[col])
collist.append(col)
except ValueError:
pass
return collist
But my use case needs me to have date columns and datetime columns separately. Is there a way how we can filter out date and datetime columns separately?
import pandas as pd
df = pd.DataFrame({
'date1':['4/10/2021', '4/11/2021','4/12/2021'],
'date_time1': ['4/11/2021 13:23:45', '4/11/2021 13:23:45', '4/11/2021 13:23:45'],
'Name': ['Bob', 'Jane', 'Alice'],
'date_time2': ['4/12/2021 13:23:45', '4/13/2021 13:23:45', '4/14/2021 13:23:45']
})
So, date column list should give me [date1] and datetime column list should give me [date_time1,date_time2]
CodePudding user response:
Since you have already read the data and converted everything to datetime and were storing it in collist - (datecollist), use the below snippet to parse these timestamps and distinguish between date and datetime.
datetime_col_list = []
df = pd.read_csv(filename, delimiter=delimiter, encoding=encoding, parse_dates=date_collist)
for col in date_collist:
first_index = df[col].first_valid_index()
first_valid_value = df[col].loc[first_index]
if (str(first_valid_value).split(' ')[1]) != '00:00:00':
datetime_col_list.append(col)
date_list = list (set(date_collist) - set(datetime_col_list))
print(date_list)
print(datetime_col_list)
CodePudding user response:
not exactly sure if that satisfies your needs but to distinguish between date and datetime inputs, you could check if for a given datetime column, all times are set to the default zero:
import pandas as pd
df = pd.DataFrame({
'date1':['4/10/2021', '4/11/2021','4/12/2021'],
'date_time1': ['4/11/2021 13:23:45', '4/11/2021 13:23:45', '4/11/2021 13:23:45'],
'Name': ['Bob', 'Jane', 'Alice'],
'date_time2': ['4/12/2021 13:23:45', '4/13/2021 13:23:45', '4/14/2021 13:23:45'],
'numbers': [1, 2, 3]
})
col_dateonly, col_datetime, col_other = [], [], []
for col in df.columns:
if pd.to_numeric(df[col], errors='coerce').notnull().all():
col_other.append(col)
else:
try:
df[col] = pd.to_datetime(df[col])
except ValueError:
col_other.append(col)
continue
else:
if (df[col] == df[col].dt.floor('D')).all():
col_dateonly.append(col)
else:
col_datetime.append(col)
print(col_dateonly)
print(col_datetime)
print(col_other)
# ['date1']
# ['date_time1', 'date_time2']
# ['Name', 'numbers']