Home > Net >  Dask map_partitions fails when dataframe contains date field
Dask map_partitions fails when dataframe contains date field

Time:12-28

The following code fails, saying that field date2 doesn't have the month attribute because date2 type is Series, when its type is clearly a date. What am I missing?

Error is AttributeError: 'Series' object has no attribute 'month'

import pandas as pd
import dask
import dask.dataframe as dd
import datetime

pdf = pd.DataFrame({
    'id2': [1, 1, 1, 2, 2],
    'balance': [150, 140, 130, 280, 260],
    'date2' : [datetime.datetime(2021,3,1), datetime.datetime(2021,4,1), 
               datetime.datetime(2021,5,1), datetime.datetime(2021,1,1), 
               datetime.datetime(2021,2,1)]
})

ddf = dd.from_pandas(pdf, npartitions=1) 

def func2(df):
    return df.date2.month

x = ddf.map_partitions(func2)  # <-- fails here

CodePudding user response:

To access datetime functions, one needs to use .dt accessor, so the fix in this case is:

def func2(df):
    return df.date2.dt.month

Note that in this case, the function accepts a dataframe, but returns a series. This is fine, but for some use-cases one might be interested in modifying the dataframe and returning the modified version. In such cases, the function would look like this:

def func2(df):
    df['modified_column'] = df.date2.dt.month
    return df
  • Related