Home > Software engineering >  Issue for converting a datetime format using pandas dataframe
Issue for converting a datetime format using pandas dataframe

Time:11-20

I have a dataset that contains a date_time column in this format:

date_time
01.01.2018 00:00 - 01.01.2018 00:15 (UTC)
01.01.2018 00:15 - 01.01.2018 00:30 (UTC)
01.01.2018 00:30 - 01.01.2018 00:45 (UTC)
01.01.2018 00:45 - 01.01.2018 01:00 (UTC)
.
.
.

By reading this dataset using pandas, I need to convert date_time column from the type object to datetime by using the following:

df['date_time'] = pd.to_datetime(df['date_time'])

However, I get this error

  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 643, in parse
    raise ParserError("Unknown string format: %s", timestr)
dateutil.parser._parser.ParserError: Unknown string format: 01.01.2018 00:00 - 01.01.2018 00:15 (UTC)

Clearly, pandas is not aware of date_time original format. How can I solve this?

CodePudding user response:

First off, what you have is a range, not a single datetime.

Start with splitting it into start and end datetimes:

df[['dt_start', 'dt_end']] = df['date_time'].str.split(' - ', expand = True)

Now, starting datetime converts without issues (use utc = True if you want to keep it timezone-aware):

df['dt_start'] = pd.to_datetime(df['dt_start'], utc = True)

Ending datetime needs a little help in feeding it the format manually since to_datetime parser with arguments at default can't recognize (UTC) stamp:

df['dt_end'] = pd.to_datetime(df['dt_end'], format = '%d.%m.%Y %H:%M (UTC)', utc = True)
  • Related