I have a user-defined function tmp_func
and a dask dataframe df
. I would like to apply this function on each group of df
.
def tmp_func(s_df):
...
return(s_df)
result = df.groupby('id').apply(tmp_func, meta = meta)
result = result.compute(scheduler = 'processes')
It is recommended to specify the dtypes of columns of dataframe returned by tmp_func
. In my case, the resulting dataframe from tmp_func
has over 20.000 columns which contain only natural numbers. So I think np.int8
is the datatype.
Is there anyway to specify that all columns have the same datatype np.int8
? It would be a nightmare to specify it by a dictionary with over 20.000 elements.
CodePudding user response:
Just use a dict-comprehension
result = df.groupby('id').apply(tmp_func, meta = {col: np.int8 for col in df.columns)}