I use this code to remove outliers from my df
from scipy import stats
df = df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]
how do I get a list of all the rows index I deleted? thanks!
CodePudding user response:
If you are interested in obtaining the index of the deleted rows:
index = df[(np.abs(stats.zscore(df)) >= 3).all(axis=1)].index
CodePudding user response:
index = list(df.index.values)
df = df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]
removed_index = []
index_after_remove = list(df.index.values)
for i in index:
if i not in index_after_remove:
removed_index.append(i)