I have a dataframe that is missing values in a column. I'm trying to fill it by looking for the nan values then checking if another column in that row has a specific string value. If so, the code should replace the missing value with a specific string. I tried this but got a SettingWithCopyWarning.
sLength = len(df['Column 5'])
for x in range(sLength):
if pd.isnull(df.iloc[x,5]):
if df.iloc[x,4]== "String 1":
df.loc[x,5]= "String 2"
Is there another way to do this?
CodePudding user response:
I think the np.where() will solve your problem and would be faster than a loop.
np.where(when condition, then, else)
is basically a method through which we can tell a data frame frame what to do when a condition is met.
import pandas as pd
import numpy as np
This is our function
df['Job'] = np.where((df['Job'].isna() & df['Service'].eq('Engineer')),'Web Developer',df['Job'])
This is our output
In your case you can use:
df['Column 5'] = np.where((df['Column 5'].isna() & df['Column 4'].eq('String 1')), 'String 2', df['Column 5'])
CodePudding user response:
make use of fillna
df['Column 5']=(df['Column 5'].fillna( df['Column 5']
.mask(df['Column 4'].eq('String 1'),
df['Column 4'] ) )
)