So I am trying to remove the outlier by making a function :
def remove_outlier_IQR(data):
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
Inter_Q = Q3-Q1
df_final = [~((data<(Q1 - 1.5*Inter_Q)) | (data>(Q3 1.5*Inter_Q)))]
return df_final
Then I remove the outlier onto the designated column with the outlier.
df_outlier_removed=remove_outlier_IQR(df[["Umur","Skor Belanja (1-100)"]])
df_outlier_removed.dropna(axis=0, inplace=True)
df_outlier_removed
However, it returns the error as
AttributeError: 'list' object has no attribute 'dropna'
CodePudding user response:
The error is pretty self explanatory: your function "df_outlier_removed" returns a list:
df_final = [~((data<(Q1 - 1.5*Inter_Q)) | (data>(Q3 1.5*Inter_Q)))]
If you add a print statement for df_final before you return, you will see that it is a list.
You are then trying to call the Pandas DataFrame member function "dropna" on a list, which won't work.
To fix, make your function return a DataFrame object
CodePudding user response:
You have a list and you kept it as a list throughout the function. If you want to use DataFrame.dropna() then this would work for you:
def remove_outlier_IQR(data):
Q1 = data.quantile(0.25)
Q3 = data.quantile(0.75)
Inter_Q = Q3-Q1
list_final = [~((data<(Q1 - 1.5*Inter_Q)) | (data>(Q3 1.5*Inter_Q)))]
df_final = pd.DataFrame(list_final)
return df_final
The pd.DataFrame() converts the list to a DataFrame before exiting the function.
CodePudding user response:
you can use a better way
(suppose that your input is a dataframe. if it is a list convert to dataframe)
df = data[(data>=(Q1 - 1.5*Inter_Q)) & (data<=(Q3 1.5*Inter_Q)].reset_index(drop=True)
it is very very faster