I have to merge two dataframes created from an arrow file and a csv file. The dataframes have similar column types other than just one column. This column stores dates.
some_date
---------
2015-07-03 00:00:00 00:00
2015-07-06 00:00:00 00:00
2015-07-07 00:00:00 00:00
2015-07-08 00:00:00 00:00
2015-07-09 00:00:00 00:00
When I read the arrow file, the corresponding dataframe column has a type of datetime64[ns, UTC]
and the csv version of the dataframe has a column types as category
.
I need to merge these two dataframes. So I convert the csv dataframe for date into datetime64[ns, UTC]
format.
csv_data['some_date'] = pd.to_datetime(csv_data['series_value_date'], utc = True)
This works for some dataframes and for some it doesn't.
For example, below csv is being converted just fine into datetime64[ns, UTC]
.
2022-01-08 00:00:00 00:00
2022-01-09 00:00:00 00:00
2022-08-09 00:00:00 00:00
2022-08-10 00:00:00 00:00
2022-08-11 00:00:00 00:00
However, the below one doesn't. It stays category
even after converting it explicitly.
2015-07-03 00:00:00 00:00
2015-07-06 00:00:00 00:00
2015-07-07 00:00:00 00:00
2015-07-08 00:00:00 00:00
2015-07-09 00:00:00 00:00
2015-07-10 00:00:00 00:00
what could be the difference between the csv files here ? Is there a better way for this conversion so that it is uniform across all cases ?
CodePudding user response:
You may additionally need astype('datetime64[ns, UTC]')
:
csv_data['some_date'] = pd.to_datetime(csv_data['series_value_date'], utc = True).astype('datetime64[ns, UTC]')