AttributeError: 'list' object has no attribute 'dropna' (outlier)-CodePudding

So I am trying to remove the outlier by making a function :

def remove_outlier_IQR(data):
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
Inter_Q = Q3-Q1
df_final = [~((data<(Q1 - 1.5*Inter_Q)) | (data>(Q3   1.5*Inter_Q)))]
return df_final

Then I remove the outlier onto the designated column with the outlier.

df_outlier_removed=remove_outlier_IQR(df[["Umur","Skor Belanja (1-100)"]])
df_outlier_removed.dropna(axis=0, inplace=True)
df_outlier_removed

However, it returns the error as

AttributeError: 'list' object has no attribute 'dropna'

CodePudding user response：

The error is pretty self explanatory: your function "df_outlier_removed" returns a list:

df_final = [~((data<(Q1 - 1.5*Inter_Q)) | (data>(Q3   1.5*Inter_Q)))]

If you add a print statement for df_final before you return, you will see that it is a list.

You are then trying to call the Pandas DataFrame member function "dropna" on a list, which won't work.

To fix, make your function return a DataFrame object

CodePudding user response：

You have a list and you kept it as a list throughout the function. If you want to use DataFrame.dropna() then this would work for you:

def remove_outlier_IQR(data):
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
Inter_Q = Q3-Q1
list_final = [~((data<(Q1 - 1.5*Inter_Q)) | (data>(Q3   1.5*Inter_Q)))]
df_final = pd.DataFrame(list_final)
return df_final

The pd.DataFrame() converts the list to a DataFrame before exiting the function.

CodePudding user response：

you can use a better way

(suppose that your input is a dataframe. if it is a list convert to dataframe)

df = data[(data>=(Q1 - 1.5*Inter_Q)) & (data<=(Q3   1.5*Inter_Q)].reset_index(drop=True)

it is very very faster