I have a dataframe:
df = pd.DataFrame([0.05, 0.04, 0.08, 0.8, -0.4, 1.0, -1.0, 1.8, -0.5, -0.05])
I need to replace by np.nan all elements in the row that are outside the limits of mean 3std and mean-3std. In other words, values less than mean-3std, and values higher than mean 3std, should be replaced by np.nan.
I am just putting one column of my dataframe, but I actually have eleven columns where all values outside the specific column range must be replaced by np.nan. All in place within the original df.
Is there a simple way of doing this? I have tried to use lambda functions with where, but it doesn't work.
CodePudding user response:
Use DataFrame.mask
mean = df.mean()
std = df.std()
outliers_mask = (df < mean - 3*std) | (df > mean 3*std)
# replace outliers with NaN
df = df.mask(outliers_mask)
CodePudding user response:
IIUC
df = pd.DataFrame([0.05, 0.04, 0.08, 0.8, -0.4, 1.0, -1.0, 1.8, -0.5, -0.05])
condition_list = [(df[0] < df[0].quantile(.25)) | (df[0] > df[0].quantile(.75))]
choice_list = [np.nan]
df['0'] = np.select(condition_list, choice_list, df[0])
df