The following code fails, saying that field date2
doesn't have the month
attribute because date2
type is Series
, when its type is clearly a date. What am I missing?
Error is AttributeError: 'Series' object has no attribute 'month'
import pandas as pd
import dask
import dask.dataframe as dd
import datetime
pdf = pd.DataFrame({
'id2': [1, 1, 1, 2, 2],
'balance': [150, 140, 130, 280, 260],
'date2' : [datetime.datetime(2021,3,1), datetime.datetime(2021,4,1),
datetime.datetime(2021,5,1), datetime.datetime(2021,1,1),
datetime.datetime(2021,2,1)]
})
ddf = dd.from_pandas(pdf, npartitions=1)
def func2(df):
return df.date2.month
x = ddf.map_partitions(func2) # <-- fails here
CodePudding user response:
To access datetime
functions, one needs to use .dt
accessor, so the fix in this case is:
def func2(df):
return df.date2.dt.month
Note that in this case, the function accepts a dataframe, but returns a series. This is fine, but for some use-cases one might be interested in modifying the dataframe and returning the modified version. In such cases, the function would look like this:
def func2(df):
df['modified_column'] = df.date2.dt.month
return df