Home > Blockchain >  how to convert a column in pandas dataframe from category to date time utc?
how to convert a column in pandas dataframe from category to date time utc?

Time:09-04

I have to merge two dataframes created from an arrow file and a csv file. The dataframes have similar column types other than just one column. This column stores dates.

some_date
---------
2015-07-03 00:00:00 00:00
2015-07-06 00:00:00 00:00
2015-07-07 00:00:00 00:00
2015-07-08 00:00:00 00:00
2015-07-09 00:00:00 00:00

When I read the arrow file, the corresponding dataframe column has a type of datetime64[ns, UTC] and the csv version of the dataframe has a column types as category.

I need to merge these two dataframes. So I convert the csv dataframe for date into datetime64[ns, UTC] format.

csv_data['some_date'] = pd.to_datetime(csv_data['series_value_date'], utc = True)

This works for some dataframes and for some it doesn't.

For example, below csv is being converted just fine into datetime64[ns, UTC].

2022-01-08 00:00:00 00:00
2022-01-09 00:00:00 00:00
2022-08-09 00:00:00 00:00
2022-08-10 00:00:00 00:00
2022-08-11 00:00:00 00:00

However, the below one doesn't. It stays category even after converting it explicitly.

2015-07-03 00:00:00 00:00
2015-07-06 00:00:00 00:00
2015-07-07 00:00:00 00:00
2015-07-08 00:00:00 00:00
2015-07-09 00:00:00 00:00
2015-07-10 00:00:00 00:00

what could be the difference between the csv files here ? Is there a better way for this conversion so that it is uniform across all cases ?

CodePudding user response:

You may additionally need astype('datetime64[ns, UTC]'):

csv_data['some_date'] = pd.to_datetime(csv_data['series_value_date'], utc = True).astype('datetime64[ns, UTC]')
  • Related