I want to extend the datetime class to add some additional functionalities. Therefore, following the indications available e.g., here and here, I have prepared the following class:
import datetime
class CPyTime(datetime.datetime):
def __new__(cls, year, month=0, day=1):
return super().__new__(cls, year, month, day)
# Additional constructors
@classmethod
def from_my_own_date(cls, year_string=None, month_string=None):
year = int(year_string)
month = int(month_string)
obj = cls(year, month)
assert isinstance(obj, cls), "{}: wrong object type returned".format(CPyTime.from_my_own_date.__name__)
return obj
@property
def year_plus_month(self):
return self.year self.month
The class seems to work fine by itself, as shown in the following code snippet:
>>> my_date = CPyTime(2021, 10)
my_date_custom = CPyTime.from_my_own_date("2021", "12")
print(f"{my_date}, {my_date.year_plus_month}")
print(f"{my_date_custom}, {my_date_custom.year_plus_month}")
2021-10-01 00:00:00, 2031
2021-12-01 00:00:00, 2033
>>> type(my_date)
<class '__main__.CPyTime'>
The problem I face is that when the class is used inside a pandas dataframe pandas seem to automatically convert from CPyTime to TimeStamp and the additional functionalities of CPyTime are therefore lost. The following code snippet shows the problem:
import pandas as pd
pdf = pd.DataFrame(data=[[2021, 1], [2021, 2], [2021, 3], [2021, 4]], columns=["Year", "Month"])
pdf["OwnDate"] = pdf.apply(lambda row: CPyTime(row["Year"], row["Month"]), axis=1)
Then, the dataframe is created and contains the new column "OwnDate":
pdf
Year Month OwnDate
0 2021 1 2021-01-01
1 2021 2 2021-02-01
2 2021 3 2021-03-01
3 2021 4 2021-04-01
However, the data type of the "OwnDate" column is datetime and the additional functionalities of CPyTime are not available:
>>> pdf.dtypes
Year int64
Month int64
OwnDate datetime64[ns]
dtype: object
>>> pdf["OwnDate"][0].year_plus_month
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.1\plugins\python-ce\helpers\pydev\_pydevd_bundle\pydevd_exec2.py", line 3, in Exec
exec(exp, global_vars, local_vars)
File "<input>", line 1, in <module>
AttributeError: 'Timestamp' object has no attribute 'year_plus_month'
Can anyone please help to sort out this problem? Is it possible to use a derived datetime class in a pandas dataframe without actually losing the additional functionalities of the derived class?
CodePudding user response:
An option could be to create a Series from a list
, setting the dtype explicitly to object:
import pandas as pd
pdf = pd.DataFrame(data=[[2021, 1], [2021, 2], [2021, 3], [2021, 4]], columns=["Year", "Month"])
pdf["OwnDate"] = pd.Series(
[CPyTime(row["Year"], row["Month"]) for _, row in pdf.iterrows()],
dtype='object'
)
print(pdf.dtypes)
# Year int64
# Month int64
# OwnDate object
# dtype: object
print(pdf["OwnDate"][0].year_plus_month)
# 2022
See also How to prevent Pandas from converting datetimes to datetime64.