I have a huge Dataframe that I'm reading using Dask dataFrame. In pandas I use,
df.loc[df['Ref']!='ABC','Ref2'] = np.nan
Then I frontfill the changed column as shown below,
df['Ref2'] = df['Ref2'].fillna(method = 'ffill')
for making a change in a column based on condition on another column value.
How can the same be achieved using Dask Dataframe?
I'm new to Dask Dataframe
CodePudding user response:
Use dask.dataframe.Series.mask
and dask.dataframe.Series.fillna
:
df['Ref2'] = df['Ref2'].mask(df['Ref']!='ABC').fillna(method = 'ffill')
CodePudding user response:
A different way to write this (closer to the pandas
syntax):
mask = df['Ref']!='ABC'
df.loc[mask,'Ref2'] = np.nan
df['Ref2'] = df['Ref2'].fillna(method = 'ffill')
dask
closely follows pandas syntax, so often the pandas expression will work.