Pandas replacing outlier new list to column value-CodePudding

I try to use the code from this post on a subset .iloc[:,6:] of my dataset df2_clean

Original code:

for col in data.columns:
mean = data[col].mean()
std = data[col].std()

N = 1.5
upper = mean   N*std
lower = mean - N*std

median = data[col].median()
newcol = []
for val in data[col]:
    if val < lower or val > upper:
        newcol.append(median)
    else:
        newcol.append(val)

data[col] = newcol

My issue is that the list newcol is computed well but the line df2_clean.iloc[:,6:][col] = newcol doesn't seem to work. It is really confusing because when I add two Prints:

 print(max(newcol))
 df2_clean.iloc[:,6:][col] = newcol
 print(df2_clean.iloc[:,6:][col].max(),"\n")

The result is

13.59324074073811
85.57633101852116

Posting my entire code (basically the same but in case) below:

for col in df2_clean.iloc[:,6:].columns:
    mean = df2_clean.iloc[:,6:][col].mean()
    std = df2_clean.iloc[:,6:][col].std()
    
    N = 1
    upper = mean   N*std
    lower = mean - N*std
    median = df2_clean.iloc[:,6:][col].median()
    newcol = []
    for val in df2_clean.iloc[:,6:][col]:
        if val < lower or val > upper:
            newcol.append(median)
        else:
            newcol.append(val)
    print(max(newcol))
    df2_clean.iloc[:,6:][col] = newcol
    print(df2_clean.iloc[:,6:][col].max(),"\n")

CodePudding user response：

Here, you don't need to always use iloc as it gives you the same data as just df2_clean[col]. You just need it in the for loop to say you only want to loop through 6:.

Here's your function cleaned up a little, it should work that way:

for col in df2_clean.iloc[:,6:].columns:
    mean = df2_clean[col].mean()
    std = df2_clean[col].std()
    
    N = 1
    upper = mean   N*std
    lower = mean - N*std
    median = df2_clean[col].median`enter code here`()
    
    df2_clean[col] = df2_clean[col].apply(lambda val: median if val < lower or val > upper else val)