Home > front end >  Trying to Drop values by column (I convert these values to nan but could be anything) not working
Trying to Drop values by column (I convert these values to nan but could be anything) not working

Time:05-26

Trying to drop NAs by column in Dask, given a certain threshold and I receive the error below.

I'm receiving the following error, but this should be working. Please advise.

enter image description here

reproducible example.

import pandas as pd
import dask

data = [['tom', 10], ['nick', 15], ['juli', 5]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age'])

import numpy as np
df = df.replace(5, np.nan)

ddf = dd.from_pandas(df, npartitions = 2)

ddf.dropna(axis='columns') 

CodePudding user response:

Passing axis is not support for dask dataframes as of now. You cvan also print docstring of the function via ddf.dropna? and it will tell you the same:

Signature: ddf.dropna(how='any', subset=None, thresh=None)
Docstring:
Remove missing values.

This docstring was copied from pandas.core.frame.DataFrame.dropna.

Some inconsistencies with the Dask version may exist.

See the :ref:`User Guide <missing_data>` for more on which values are
considered missing, and how to work with missing data.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0  (Not supported in Dask)
    Determine if rows or columns which contain missing values are
    removed.

    * 0, or 'index' : Drop rows which contain missing values.
    * 1, or 'columns' : Drop columns which contain missing value.

    .. versionchanged:: 1.0.0

       Pass tuple or list to drop on multiple axes.
       Only a single axis is allowed.

how : {'any', 'all'}, default 'any'
    Determine if row or column is removed from DataFrame, when we have
    at least one NA or all NA.

    * 'any' : If any NA values are present, drop that row or column.
    * 'all' : If all values are NA, drop that row or column.

thresh : int, optional
    Require that many non-NA values.
subset : array-like, optional
    Labels along other axis to consider, e.g. if you are dropping rows
    these would be a list of columns to include.
inplace : bool, default False  (Not supported in Dask)
    If True, do operation inplace and return None.

Returns
-------
DataFrame or None
    DataFrame with NA entries dropped from it or None if ``inplace=True``.

Worth noting that Dask Documentation is copied from pandas for many instances like this. But wherever it does, it specifically states that:

This docstring was copied from pandas.core.frame.DataFrame.drop. Some inconsistencies with the Dask version may exist.

Therefore its always best to check docstring for dask's pandas-driven functions instead of relying on documentation

  • Related