I have a dataset that contains a date_time
column in this format:
date_time
01.01.2018 00:00 - 01.01.2018 00:15 (UTC)
01.01.2018 00:15 - 01.01.2018 00:30 (UTC)
01.01.2018 00:30 - 01.01.2018 00:45 (UTC)
01.01.2018 00:45 - 01.01.2018 01:00 (UTC)
.
.
.
By reading this dataset using pandas, I need to convert date_time
column from the type object
to datetime
by using the following:
df['date_time'] = pd.to_datetime(df['date_time'])
However, I get this error
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 643, in parse
raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 01.01.2018 00:00 - 01.01.2018 00:15 (UTC)
Clearly, pandas is not aware of date_time
original format. How can I solve this?
CodePudding user response:
First off, what you have is a range, not a single datetime.
Start with splitting it into start and end datetimes:
df[['dt_start', 'dt_end']] = df['date_time'].str.split(' - ', expand = True)
Now, starting datetime converts without issues (use utc = True
if you want to keep it timezone-aware):
df['dt_start'] = pd.to_datetime(df['dt_start'], utc = True)
Ending datetime needs a little help in feeding it the format manually since to_datetime
parser with arguments at default can't recognize (UTC)
stamp:
df['dt_end'] = pd.to_datetime(df['dt_end'], format = '%d.%m.%Y %H:%M (UTC)', utc = True)