I'm just starting using Dask as a possible replacement (?) of pandas. The first think that hit me is that i can't seem to find a way to create a dataframe from a couple lists/arrays.
In regular pandas i just do: pd.DataFrame({'a':a,'b':b,...})
but i can't find an equivalent way to do it in Dask, other than create the df in pandas and then create a dask df with from_pandas()
.
Is there any way? Or the only way is literally to create the df in pandas and then "import" it into a dask df?
CodePudding user response:
There is a fairly recent feature by @MrPowers that allows creating dask.DataFrame
using from_dict
method:
from dask.dataframe import DataFrame
ddf = DataFrame.from_dict({"num1": [1, 2, 3], "num2": [7, 8, 9]}, npartitions=2)
However, note that this method is meant for more concise dask.DataFrame
code when used in tutorials and code examples, so when working with real datasets it's better to use more appropriate methods, e.g. read_csv
or read_parquet
.