I have two pandas DataFrames containing time series that must be concatenated for further processing. One DataFrame contains localized timestamps while the other contains NaT
in the time column. When concatenating, the column type changes from datetime64[ns]
to object
, hindering the further analysis.
My goal: keeping a localized time column, even after concatenation with NaT
.
Example code
import pandas as pd
a = pd.DataFrame(
{
'DateTime': pd.date_range(
start='2022-10-10',
periods=7,
freq='1D',
tz='America/New_York'
),
'Value': range(7)
}
)
b = pd.DataFrame(
{
'DateTime': pd.NaT,
'Value': range(10,20),
}
)
c = pd.concat([a, b], axis=0, ignore_index=True)
The dtypes of a
and b
are different:
>>> print(a.dtypes)
DateTime datetime64[ns, America/New_York]
Value int64
dtype: object
>>> print(b.dtypes)
DateTime datetime64[ns]
Value int64
dtype: object
Since the timestamp for a
is localized but the timestamp for b
is not, the concatenation results in an object
.
>>> print(c.dtypes)
DateTime object
Value int64
dtype: object
When trying to localize b
, I get a TypeError
:
>>> b['DateTime'] = b['DateTime'].tz_localize('America/New_York')
Traceback (most recent call last):
File "/tmp/so-pandas-nat.py", line 27, in <module>
b['DateTime'] = b['DateTime'].tz_localize('America/New_York')
File ".venv/lib/python3.10/site-packages/pandas/core/generic.py", line 9977, in tz_localize
ax = _tz_localize(ax, tz, ambiguous, nonexistent)
File ".venv/lib/python3.10/site-packages/pandas/core/generic.py", line 9959, in _tz_localize
raise TypeError(
TypeError: index is not a valid DatetimeIndex or PeriodIndex
CodePudding user response:
Use Series.dt.tz_localize
for processing column, if use Series.tz_localize
it want processing DatetimeIndex
, here raise error, becuse RangeIndex
:
b['DateTime'] = b['DateTime'].dt.tz_localize('America/New_York')
c = pd.concat([a, b], axis=0, ignore_index=True)
print(c.dtypes)
DateTime datetime64[ns, America/New_York]
Value int64
dtype: object