Home > Mobile >  Dask drop() not dropping columns when I need it to
Dask drop() not dropping columns when I need it to

Time:12-15

I'm new to Dask and the manner in which columns are dropped is confusing to me. I've read a csv file into the Dask dataframe. Then suppose I have this:

print(len(columns_to_drop))   # There are 66
print(len(list(df.columns)))  # The Dask columns before the drop
df.drop(columns_to_drop, axis=1).compute(). # Drop the columns
pd_df = df.compute()  #  Create a Pandas dataframe
print(pd_df.shape[1])  # Pandas dataframe columns
print(len(list(df.columns)))  # The Dask columns after the drop

What I get from the print statements:

  • 66 columns to drop
  • 207 Dask df columns before the drop
  • 207 Pandas column count
  • 207 Dask column after the drop

CodePudding user response:

You need to add inplace=True to drop(), because by default it return a copy of the original dataframe with the specified columns removed.

df.drop(columns_to_drop, axis=1, inplace=True).compute()

CodePudding user response:

Assuming that the dataframe fits into memory, this should do the trick:

df = df.drop(columns_to_drop, axis=1). # Drop the columns
pd_df = df.compute()  #  Create a Pandas dataframe
  • Related