Using IQR method to remove outliers does not change shape of data frame-CodePudding

I'm trying to remove outliers using IQR method. However, the shape of my df remains the same.

Here is the code:

def IQR_outliers(df):

     Q1=df.quantile(0.25)
     Q3=df.quantile(0.75)
     IQR=Q3-Q1
     df=df[~((df<(Q1-1.5*IQR)) | (df>(Q3 1.5*IQR)))]
     return df
    
IQR_outliers(df['Distance'])
IQR_outliers(df['Price'])

CodePudding user response：

Your function considers the whole object that is passed, but you're only passing a single series each time you use it. You're also not capturing the output. All of these things stack on top of each to make your problem pretty complex.

So here's what I would do:

add a column argument to your function
modifying the function to only consider that column when selecting rows from the entire dataframe
pipe the dataframe to that function a couple of times

So that's:


def IQR_outliers(df, column):

     Q1 = df[column].quantile(0.25)
     Q3 = df[column].quantile(0.75)
     IQR = Q3 - Q1
     df = df.loc[lambda df: ~((df[column] < (Q1 - 1.5 * IQR)) | (df[column] > (Q3   1.5 * IQR)))]
     return df
    

revised_df = df.pipe(IQR_outliers, 'Distance').pipe(IQR_outliers, 'Price')

Note that the way you've demonstrated this, you'll very likely drop rows where Distance is an outlier even if Price is not. If you don't want to do that, you'll need to stack your dataframe, apply this function to a groupby operation, and then optionally unstack the dataframe