Home > Back-end >  How do we detect date and datetime columns separately from a dataframe?
How do we detect date and datetime columns separately from a dataframe?

Time:09-26

I currently use pd.read_csv to read the dataframe. It does not detect both date and datetime columns, but instead checks it out as object. So, I use the below snippet to find date/datetime columns:

df = pd.read_csv(filename)
collist = []
for col in df.columns:
    if df[col].dtype == 'object':
        try:
            df[col] = pd.to_datetime(df[col])
            collist.append(col)
        except ValueError:
            pass
return collist

But my use case needs me to have date columns and datetime columns separately. Is there a way how we can filter out date and datetime columns separately?

import pandas as pd
df = pd.DataFrame({
    'date1':['4/10/2021', '4/11/2021','4/12/2021'],
    'date_time1': ['4/11/2021 13:23:45', '4/11/2021 13:23:45', '4/11/2021 13:23:45'],
    'Name': ['Bob', 'Jane', 'Alice'],
    'date_time2': ['4/12/2021 13:23:45', '4/13/2021 13:23:45', '4/14/2021 13:23:45']
})

So, date column list should give me [date1] and datetime column list should give me [date_time1,date_time2]

CodePudding user response:

Since you have already read the data and converted everything to datetime and were storing it in collist - (datecollist), use the below snippet to parse these timestamps and distinguish between date and datetime.

datetime_col_list = []
df = pd.read_csv(filename, delimiter=delimiter, encoding=encoding, parse_dates=date_collist)
for col in date_collist:
        first_index = df[col].first_valid_index()
        first_valid_value = df[col].loc[first_index]
        if (str(first_valid_value).split(' ')[1]) != '00:00:00':
            datetime_col_list.append(col)

date_list = list (set(date_collist) - set(datetime_col_list))
print(date_list)
print(datetime_col_list)

CodePudding user response:

not exactly sure if that satisfies your needs but to distinguish between date and datetime inputs, you could check if for a given datetime column, all times are set to the default zero:

import pandas as pd

df = pd.DataFrame({
    'date1':['4/10/2021', '4/11/2021','4/12/2021'],
    'date_time1': ['4/11/2021 13:23:45', '4/11/2021 13:23:45', '4/11/2021 13:23:45'],
    'Name': ['Bob', 'Jane', 'Alice'],
    'date_time2': ['4/12/2021 13:23:45', '4/13/2021 13:23:45', '4/14/2021 13:23:45'],
    'numbers': [1, 2, 3]
})


col_dateonly, col_datetime, col_other = [], [], []
for col in df.columns:
    if pd.to_numeric(df[col], errors='coerce').notnull().all():
        col_other.append(col)
    else:
        try:
            df[col] = pd.to_datetime(df[col])
        except ValueError:
            col_other.append(col)
            continue
        else:
            if (df[col] == df[col].dt.floor('D')).all():
                col_dateonly.append(col)
            else:
                col_datetime.append(col)

print(col_dateonly)
print(col_datetime)
print(col_other)
# ['date1']
# ['date_time1', 'date_time2']
# ['Name', 'numbers']
  • Related