Home > Software engineering >  function to drop outliers
function to drop outliers

Time:02-13

I created a function to drop my outliers. Here is the function

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    train = train.drop(drop_index,axis = 0)

and when I do

dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Nothing is being dropped.However when I manually execute the function. i.e get the index in the dataframe for this condition, I do get a valid index (943) and when I do

train = train.drop([943],axis = 0)

Then the row I want is being dropped correctly. I don't understand why the function wouldn't work as its supposed to be doing exactly what I am doing manually.

CodePudding user response:

At the end of dropping_outliers, it's assigning the result of drop to a local variable, not altering the dataframe passed in. Try this instead:

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    return train.drop(drop_index,axis = 0)

Then do the assignment when you call the function.

train = dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Also see python pandas dataframe, is it pass-by-value or pass-by-reference.

  • Related