Home > Enterprise >  How to replace by np.nan elements outside a range of float numbers in a row in pandas?
How to replace by np.nan elements outside a range of float numbers in a row in pandas?

Time:06-16

I have a dataframe:

df = pd.DataFrame([0.05, 0.04, 0.08, 0.8, -0.4, 1.0, -1.0, 1.8, -0.5, -0.05])

I need to replace by np.nan all elements in the row that are outside the limits of mean 3std and mean-3std. In other words, values less than mean-3std, and values higher than mean 3std, should be replaced by np.nan.

I am just putting one column of my dataframe, but I actually have eleven columns where all values outside the specific column range must be replaced by np.nan. All in place within the original df.

Is there a simple way of doing this? I have tried to use lambda functions with where, but it doesn't work.

CodePudding user response:

Use DataFrame.mask

mean = df.mean()
std = df.std()
outliers_mask = (df < mean - 3*std) | (df > mean   3*std)

# replace outliers with NaN
df = df.mask(outliers_mask)

CodePudding user response:

IIUC

df = pd.DataFrame([0.05, 0.04, 0.08, 0.8, -0.4, 1.0, -1.0, 1.8, -0.5, -0.05])
condition_list = [(df[0] < df[0].quantile(.25)) | (df[0] > df[0].quantile(.75))]
choice_list = [np.nan]
df['0'] = np.select(condition_list, choice_list, df[0])
df
  • Related