Home > Mobile >  Is there a faster way to assign a column to a dataframe (that has a condition) other than iloc (will
Is there a faster way to assign a column to a dataframe (that has a condition) other than iloc (will

Time:07-20

df2.loc[(df2['feature'] == 0), 'package_loss'] =1

My code is above. Here, I am trying changing a value to the column 'package_loss' to 1 if another column equals 0.

CodePudding user response:

Use dask.dataframe.DataFrame.where:

df2['package_loss'].where((df2['feature'] == 0), df2['package_loss'], 1).compute()

CodePudding user response:

This is not as terse as @jezrael's answer, but allows more flexible transformations using pandas syntax:

from dask.datasets import timeseries


def add_col(df):
    df = df.copy()
    mask = df["name"] == "Dan"
    df["new_column"] = 0
    df.loc[mask, "new_column"] = 1
    return df


df = timeseries()
df2=df.map_partitions(add_col)
df2.head()
  • Related