Here i have tried power transfer technique to detect outliers and to remove them but its not working, i dont know why and if any one has any new suggestion then please give me.
suppose i have a dataset and in that dataset skewness is present so, i need to define a function that could detect skewness at a certain threshold from every column of the dataset and remove the skewness and return back the data after removing skewness.
# Removing outliers
from sklearn.preprocessing import PowerTransformer
def remove_skewness(x):
value = x.skew().values
for skew in value:
if skew > 4.0:
#skewness removal
pt=PowerTransformer(method='yeo-johnson')
X_power=pt.fit_transform(x)
df1=pd.DataFrame(X_power,columns=X.columns)
print("Skewness is Detected and will be Removed:")
return df1
else:
print("Skewness not Detected:")
return x
df2 = remove_skewness(df_new)
df2.head()
CodePudding user response:
Your code seems to be fine. I checked it and it works as intended as you can see in
After loop:
You might want to check your if - statement if skew > 4.0:
and check if your outlier columns ever create a skewness of 4. Simply use print(x.skew().values)
and look at the values for each column.
If all values are below 4 then it will never enter the if-statment which uses the powertransformer.