Home > other >  Pandas Dataframe - replace NaN with 0 if column value condition
Pandas Dataframe - replace NaN with 0 if column value condition

Time:07-29

I have searched all around the internet and tried many methods before making this post, I have a dataframe where I want to:

  • Replace NaN value of TGT_COLUMN_SCALE to 0 If TGT_COLUMN_DATA_TYPE is equals to NUMERIC.

my dataframe

Kindly help me out with this issue.

I tried this code but it's not working:

df["TGT_COLUMN_SCALE"] = np.where(df["TGT_COLUMN_DATA_TYPE"] == "NUMERIC", 'NaN', 0)

CodePudding user response:

Sample:

df = pd.DataFrame({
    "TGT_COLUMN_DATA_TYPE" : ["DATE", "NUMERIC", "STRING", "NUMERIC"],
    "TGT_COLUMN_SCALE" : [np.NaN, np.NaN, 4.0, 5.0]
})

Replace

df.loc[(df.TGT_COLUMN_DATA_TYPE == "NUMERIC") & (df.TGT_COLUMN_SCALE.isnull()), "TGT_COLUMN_SCALE"] = 0

Result:

    TGT_COLUMN_DATA_TYPE    TGT_COLUMN_SCALE
0   DATE    NaN
1   NUMERIC 0.0
2   STRING  4.0
3   NUMERIC 5.0

CodePudding user response:

You just need to use loc to select the columns and then you use fillna to replace values:

df.loc[df.TGT_COLUMN_SCALE == "NUMERIC",
       "TGT_COLUMN_DATA_TYPE"] = df.loc[df.TGT_COLUMN_SCALE == "NUMERIC", "TGT_COLUMN_DATA_TYPE"].fillna(0)

Full code

TGT_COLUMN_SCALE = ('DATE', 'TIMESTAMP', 'NUMERIC', 'NUMERIC')
TGT_COLUMN_DATA_TYPE = (np.nan, np.nan, np.nan, np.nan)
df = pd.DataFrame(list(zip(TGT_COLUMN_SCALE, TGT_COLUMN_DATA_TYPE)),
                  columns=['TGT_COLUMN_SCALE', 'TGT_COLUMN_DATA_TYPE'])
df.loc[df.TGT_COLUMN_SCALE == "NUMERIC",
       "TGT_COLUMN_DATA_TYPE"] = df.loc[df.TGT_COLUMN_SCALE == "NUMERIC", "TGT_COLUMN_DATA_TYPE"].fillna(0)

CodePudding user response:

np.where will take the first option as the value in case the condition is true, else the second. You need to replace the order of nan and 0

df["TGT_COLUMN_SCALE"] = np.where((df["TGT_COLUMN_DATA_TYPE"] == "NUMERIC") & (df["TGT_COLUMN_SCALE"].isnull()), 0, df["TGT_COLUMN_SCALE"])
  • Related