I have a date string and want to convert it to the date type:
I have tried to use pd.to_datetime
with the format that I want but it is returning the time without the conversion.
df = pd.DataFrame({
'date': ['2010-12-30 23:57:10 00:00', '2010-12-30 23:52:41 00:00','2010-12-30 23:43:04 00:00','2010-12-30 23:37:30 00:00','2010-12-30 23:31:39 00:00'],
'text' : ['El odontólogo Barreda, a un paso de quedar en …','Defederico es el nuevo refuerzo de Independien..','Israel: ex presidente Katzav declarado culpabl…'
, 'FMI estima que la recuperación asimétrica de l…','¿Quién fue el campeón argentino del año? Votá …']
})
df["new date"] =pd.to_datetime(df['date'], format="%Y-%m-%d")
That is the output that returns
2010-12-30 23:57:10 00:00
and I need to eliminate
23:57:10 00:00
.
CodePudding user response:
Well it's a datetime object, so it needs to keep the time information. However, there's a Period datatype that might fit here: it represents a span of time instead of a stamp:
df["new date"] = pd.to_datetime(df["date"]).dt.to_period(freq="D")
which converts to Daily periods to get
>>> df["new date"]
0 2010-12-30
1 2010-12-30
2 2010-12-30
3 2010-12-30
4 2010-12-30
Name: new date, dtype: period[D]
Noting that these are not strings; one can therefore continue to perform .dt
based operations.
If you do need datetime type, though, you can .normalize()
the timestamps to signal the time component is immaterial and they are all set to midnight:
>>> df["new date"] = pd.to_datetime(df["date"]).dt.normalize()
>>> df["new date"]
0 2010-12-30 00:00:00 00:00
1 2010-12-30 00:00:00 00:00
2 2010-12-30 00:00:00 00:00
3 2010-12-30 00:00:00 00:00
4 2010-12-30 00:00:00 00:00
Name: new date, dtype: datetime64[ns, UTC]
Lastly, if it is all about display purposes, then we can use .strftime
to shape them into a desired format:
>>> df["new date"] = pd.to_datetime(df["date"]).dt.strftime("%Y-%m-%d")
>>> df["new date"]
0 2010-12-30
1 2010-12-30
2 2010-12-30
3 2010-12-30
4 2010-12-30
Name: new date, dtype: object
As you see, the datatype is "object", i.e., string here, which would prevent datetime-based actions, e.g., df["new date"].dt.month
would no longer work unlike the first two alternatives.
CodePudding user response:
To keep a DatetimeIndex and its dt
accessor, you can use dt.normalize()
to reset the time part then dt.tz_convert
to remove the timezone information:
df['new date'] = pd.to_datetime(df["date"]).dt.normalize().dt.tz_convert(None)
Output
>>> df['new date']
0 2010-12-30
1 2010-12-30
2 2010-12-30
3 2010-12-30
4 2010-12-30
Name: new date, dtype: datetime64[ns]