I try to use the code from this post on a subset .iloc[:,6:] of my dataset df2_clean
Original code:
for col in data.columns:
mean = data[col].mean()
std = data[col].std()
N = 1.5
upper = mean N*std
lower = mean - N*std
median = data[col].median()
newcol = []
for val in data[col]:
if val < lower or val > upper:
newcol.append(median)
else:
newcol.append(val)
data[col] = newcol
My issue is that the list newcol is computed well but the line df2_clean.iloc[:,6:][col] = newcol
doesn't seem to work.
It is really confusing because when I add two Prints:
print(max(newcol))
df2_clean.iloc[:,6:][col] = newcol
print(df2_clean.iloc[:,6:][col].max(),"\n")
The result is
13.59324074073811
85.57633101852116
Posting my entire code (basically the same but in case) below:
for col in df2_clean.iloc[:,6:].columns:
mean = df2_clean.iloc[:,6:][col].mean()
std = df2_clean.iloc[:,6:][col].std()
N = 1
upper = mean N*std
lower = mean - N*std
median = df2_clean.iloc[:,6:][col].median()
newcol = []
for val in df2_clean.iloc[:,6:][col]:
if val < lower or val > upper:
newcol.append(median)
else:
newcol.append(val)
print(max(newcol))
df2_clean.iloc[:,6:][col] = newcol
print(df2_clean.iloc[:,6:][col].max(),"\n")
CodePudding user response:
Here, you don't need to always use iloc as it gives you the same data as just df2_clean[col]. You just need it in the for loop to say you only want to loop through 6:.
Here's your function cleaned up a little, it should work that way:
for col in df2_clean.iloc[:,6:].columns:
mean = df2_clean[col].mean()
std = df2_clean[col].std()
N = 1
upper = mean N*std
lower = mean - N*std
median = df2_clean[col].median`enter code here`()
df2_clean[col] = df2_clean[col].apply(lambda val: median if val < lower or val > upper else val)