Home > database >  How to specify the same datatype for all (over 20.000) columns in meta for dask?
How to specify the same datatype for all (over 20.000) columns in meta for dask?

Time:12-08

I have a user-defined function tmp_func and a dask dataframe df. I would like to apply this function on each group of df.

def tmp_func(s_df):
    ...
    return(s_df)

result = df.groupby('id').apply(tmp_func, meta = meta)
result = result.compute(scheduler = 'processes')

It is recommended to specify the dtypes of columns of dataframe returned by tmp_func. In my case, the resulting dataframe from tmp_func has over 20.000 columns which contain only natural numbers. So I think np.int8 is the datatype.

Is there anyway to specify that all columns have the same datatype np.int8? It would be a nightmare to specify it by a dictionary with over 20.000 elements.

CodePudding user response:

Just use a dict-comprehension

result = df.groupby('id').apply(tmp_func, meta = {col: np.int8 for col in df.columns)}
  • Related