I have the following dataframe
dict1 = {'x_math_lp': {'John':'0',
'Lisa': 1,
'Karyn': '2'},
'o_math_lp': {'John': 0.005,
'Lisa': 0.001,
'Karyn':0.9}}
df= pd.DataFrame(dict1)
I would like to apply a condition such that if a value in the first column is less than 1 and the value in the 2nd column if >= 0.05, then replace the value in the first column with 'NaN'
Results should look like this
x_math_lp o_math_lp
John NaN 0.005
Lisa 1 0.001
Karyn NaN 0.900
Note: The reason why I want to use a loop is because my true dataframe has 30 columns and I was to do it for every column pair set in the dataframe, essentially, updating the entire dataframe.
CodePudding user response:
You can use .loc
for your desired column and check you condition like below. (Because some number in x_math_lp
is str
you can use pd.to_numeric
)
Try this:
>>> import numpy as np
>>> df.x_math_lp = pd.to_numeric(df.x_math_lp, errors='coerce')
>>> df.loc[((df['x_math_lp'] < 1) | (df['o_math_lp'] >= 0.005)), 'x_math_lp'] = np.nan
>>> df
x_math_lp o_math_lp
John NaN 0.005
Lisa 1 0.001
Karyn NaN 0.900
If you want to run on multiple columns for every column pair you can use this:
>>> df= pd.DataFrame({'x_math_lp': {'John': 0,'Lisa': 1,'Karyn': 2},'o_math_lp': {'John': 0.005,'Lisa': 0.001,'Karyn':0.9},'y_math_lp': {'John': 0,'Lisa': 1,'Karyn': 2},'p_math_lp': {'John': 0.005,'Lisa': 0.001,'Karyn':0.9}})
>>> columns = df.columns
>>> for a,b in zip(columns[::2],columns[1::2]):
... df.loc[((df[a] < 1) | (df[b] >= 0.005)), a] = np.nan
>>> df
x_math_lp o_math_lp y_math_lp p_math_lp
John NaN 0.005 NaN 0.005
Lisa 1.0 0.001 1.0 0.001
Karyn NaN 0.900 NaN 0.900