Home > Blockchain >  how can we automatically detect skewnes in the data and skewness is present then how we can remove i
how can we automatically detect skewnes in the data and skewness is present then how we can remove i

Time:06-01

Here i have tried power transfer technique to detect outliers and to remove them but its not working, i dont know why and if any one has any new suggestion then please give me.

suppose i have a dataset and in that dataset skewness is present so, i need to define a function that could detect skewness at a certain threshold from every column of the dataset and remove the skewness and return back the data after removing skewness.

# Removing outliers
from sklearn.preprocessing import PowerTransformer
def remove_skewness(x):
    value = x.skew().values
    for skew in value:
        if skew > 4.0:
            #skewness removal
            pt=PowerTransformer(method='yeo-johnson') 
            X_power=pt.fit_transform(x)
            df1=pd.DataFrame(X_power,columns=X.columns)
            print("Skewness is Detected and will be Removed:")
            return df1
        else:
            print("Skewness not Detected:")
            return x
        
df2 = remove_skewness(df_new)
df2.head()

CodePudding user response:

Your code seems to be fine. I checked it and it works as intended as you can see in enter image description here

After loop:

enter image description here

You might want to check your if - statement if skew > 4.0: and check if your outlier columns ever create a skewness of 4. Simply use print(x.skew().values) and look at the values for each column.

If all values are below 4 then it will never enter the if-statment which uses the powertransformer.

  • Related