Home > Software design >  Pandas, removing timestamp from datetime column using dt.date or dt.strftime converts column to dtyp
Pandas, removing timestamp from datetime column using dt.date or dt.strftime converts column to dtyp

Time:02-16

Example df of how my data is stored:

df = pd.DataFrame({'DOB': {0: '2003-01-01 00:00:00 00', 1: '2003-01-02 00:00:00 00'}})

I want to convert the column to datetime, and only keep the Year-Month-Day values. Basically, exactly as this question:

Removing the timestamp from a datetime in pandas dataframe

and this one:

Convert column of date objects in Pandas DataFrame to strings

Except using these answer, extracting the dates from the column which is already a Datetime format converts the column to an Object format. This is also discussed in the comments, but seems to have been moved to chat which isn't available anymore.

Result in first answer:

[in] pd.to_datetime(df['DOB']).dt.date
[out]
0    2003-01-01
1    2003-01-02
Name: DOB, dtype: object

Result of second answer:

[in] pd.to_datetime(df['DOB']).dt.strftime('%Y-%m-%d')
[out] 
0    2003-01-01
1    2003-01-02
Name: DOB, dtype: object

As you can see, it's not a Datetime format anymore but an object. Then I can't use it furthermore in my script.

Using the format = "%Y-%m-%d option of pd.to_datetime doesn't work too, as the timestamp is still present:

[in] pd.to_datetime(df['DOB'], format = "%Y-%m-%d")
[out] 
0   2003-01-01 00:00:00 00:00
1   2003-01-02 00:00:00 00:00
Name: DOB, dtype: datetime64[ns, UTC]

What is a simple solution to make sure the column stays in a datetime format, while only keeping the date as format?

CodePudding user response:

IMO there is no issue here:

s = pd.to_datetime(pd.Series(['2021-02-01 00:00:00']))
s
# 0   2021-02-01
# dtype: datetime64[ns]

And indeed, the displayed type is "object":

s.dt.date
# 0    2021-02-01
# dtype: object

But this doesn't mean much, the type is really datetime.date:

type(s.dt.date[0])
# datetime.date

CodePudding user response:

I have it working with two extra lines, but was hoping there would be a simpler solution to this issue:

df['DOB'] = pd.to_datetime(df['DOB'])
df['DOB'] = DF['DOB'].dt.date
df['DOB'] = pd.to_datetime(df['DOB'])

# gives
[in] df
[out]
0    2003-01-01
1    2003-01-02
Name: DOB, dtype: datetime64[ns]
  • Related