How to delete a dataframe cell based on the value of the right cell-CodePudding

I have a dataframe (df) where column V01 validates column D01. If the value on column V01 is 'N', then value of column D01 in the same row is invalid and should be deleted.

import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
df

In this example, I would like to repace 551 for a null. In my case I have columns from D01 to D31 and V01 to V31. How could I approach this cleaning?

I've tried

df = df.replace('N',None)
df = df.dropna()
df

But this replaces the whole row and some valid data.

CodePudding user response：

Use query within dataframe to filter out v01=N:

df=df[df.V01!='N']]

CodePudding user response：

Input:

import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})

Use pandas.Series.where:

df["D01"] = df["D01"].where(df["V01"] != 'N', None)

or use pandas.DataFrame.loc:

df.loc[df["V01"] == 'N', "D01"] = None

output:

   D01 V01
0  5.0   V
1  NaN   N
2  2.0   V
3  4.0   V

CodePudding user response：

Using pandas.DataFrame.loc as others have said is probably better in python due to speed, but here is a loop version.

import pandas as pd

D=[5,551,2,4]
V=['V','N','V','V']
df = pd.DataFrame({'D01':D,'V01':V})
print(df)
for i in range(len(D)):
    if df.iat[i,1]=='N':
        df.iat[i,0] = None
    else:
        pass
print(df)

CodePudding user response：

You have to use DataFrame.drop() to delete any row based on a column value condition. Hope it helps