I have dask dataframe with one column named "hora" of integer type, and I want to create other column in time format. I show in the next example:
- my data is:
hora
10
17
22
19
14
- the result that I hope get for the first row is:
hora time
10 10:00:00
for that I am triying:
meta = ('time', 'datetime64[ns]')
df['hora'].map_partitions(dt.time, meta=meta).compute()
When I run code above throws:
TypeError: cannot convert the series to <class 'int'>
However I test the same example with series pandas and works.
I am applying the function "dt.time" the sameway in both cases, what is it the error?
Thanks very much in advance
CodePudding user response:
By passing dt.time
to map_partition
, you are effectively doing dt.time(df)
for each part of your dataframe. What you wanted was to apply the function to each value. You could have done either of the following:
ddf.assign(s2=ddf.hora.map(dt.time))
or
def mapper(df):
df['s2'] = df.hora.apply(dt.time)
return df
ddf.map_partitions(mapper)
(providing dtype is optional)