Home > OS >  Creating a new time-format column from a dask dataframe integer column
Creating a new time-format column from a dask dataframe integer column

Time:11-25

I have dask dataframe with one column named "hora" of integer type, and I want to create other column in time format. I show in the next example:

  • my data is:
hora
10
17
22
19
14
  • the result that I hope get for the first row is:
hora time 
10   10:00:00

for that I am triying:

meta = ('time', 'datetime64[ns]')
df['hora'].map_partitions(dt.time, meta=meta).compute()

When I run code above throws:

TypeError: cannot convert the series to <class 'int'>

However I test the same example with series pandas and works.

enter image description here

I am applying the function "dt.time" the sameway in both cases, what is it the error?

Thanks very much in advance

CodePudding user response:

By passing dt.time to map_partition, you are effectively doing dt.time(df) for each part of your dataframe. What you wanted was to apply the function to each value. You could have done either of the following:

ddf.assign(s2=ddf.hora.map(dt.time))

or

def mapper(df):
    df['s2'] = df.hora.apply(dt.time)
    return df

ddf.map_partitions(mapper)

(providing dtype is optional)

  • Related