Home > other >  Creating dask dataframe from array doesn't keep column types
Creating dask dataframe from array doesn't keep column types

Time:01-26

I'm trying to create a dask dataframe from a numpy array. For that, I need to specify the column types. As suggested in dask documentation, I use for that a pandas empty dataframe. This doesn't throw an error, however all the data types are created as object. I need to use the empty Pandas dataframe, how to make this work?

import pandas as pd
import dask.dataframe as dd

array = np.array([(1.5, 2, 3, datetime(2000,1,1)), (4, 5, 6, datetime(2001, 2, 2))])
meta = pd.DataFrame({'col1': pd.Series(dtype='float64'),
                   'col2': pd.Series(dtype='float64'),
                   'col3': pd.Series(dtype='float64'),
                   'date1': pd.Series(dtype='datetime64[ns]')})
print(meta.dtypes)

>>> col1            float64
>>> col2            float64
>>> col3            float64
>>> date1    datetime64[ns]
>>> dtype: object

columns = ['col1', 'col2', 'col3', 'date1']
ddf = dd.from_array(array, columns=columns, meta=meta)
ddf.compute()

print(ddf.dtypes)

>>> col1     object
>>> col2     object
>>> col3     object
>>> date1    object
>>> dtype: object

CodePudding user response:

Does this work -

df = (pd.DataFrame(array, columns = ["col1", "col2", "col3", "col4"])
      .astype({"col1": "float64", 
               "col2": "float64", 
               "col3": "float64", 
               "col4": "datetime64[ns]"}))
ddf = dd.from_pandas(df, npartitions=10)

The output of ddf.dtypes gives me the correct data types.

CodePudding user response:

Could dtypes be set after dataframe creation?

import pandas as pd
import numpy as np
from datetime import datetime
import dask.dataframe as dd

array = np.array([(1.5, 2, 3, datetime(2000,1,1)), (4, 5, 6, datetime(2001, 2, 2))])

columns = ['col1', 'col2', 'col3', 'date1']
ddf = dd.from_array(array, columns = columns)
ddf.compute()

ddf = ddf.astype({'col1': 'float64','col2':'float64','col3':'float64','date1':'datetime64[ns]'})
print(ddf.dtypes)
  •  Tags:  
  • Related