I have a dataframe (df) where column V01 validates column D01. If the value on column V01 is 'N', then value of column D01 in the same row is invalid and should be deleted.
import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
df
In this example, I would like to repace 551 for a null. In my case I have columns from D01 to D31 and V01 to V31. How could I approach this cleaning?
I've tried
df = df.replace('N',None)
df = df.dropna()
df
But this replaces the whole row and some valid data.
CodePudding user response:
Use query within dataframe to filter out v01=N:
df=df[df.V01!='N']]
CodePudding user response:
Input:
import pandas as pd
df = pd.DataFrame({'D01':[5,551,2,4],'V01':['V','N','V','V']})
Use pandas.Series.where
:
df["D01"] = df["D01"].where(df["V01"] != 'N', None)
or use pandas.DataFrame.loc
:
df.loc[df["V01"] == 'N', "D01"] = None
output:
D01 V01
0 5.0 V
1 NaN N
2 2.0 V
3 4.0 V
CodePudding user response:
Using pandas.DataFrame.loc
as others have said is probably better in python due to speed, but here is a loop version.
import pandas as pd
D=[5,551,2,4]
V=['V','N','V','V']
df = pd.DataFrame({'D01':D,'V01':V})
print(df)
for i in range(len(D)):
if df.iat[i,1]=='N':
df.iat[i,0] = None
else:
pass
print(df)
CodePudding user response:
You have to use DataFrame.drop() to delete any row based on a column value condition. Hope it helps