Home > OS >  How to extract other data of outlier that is specified with that outlier in box plot in python?
How to extract other data of outlier that is specified with that outlier in box plot in python?

Time:03-15

this is the my pandas data frame:

Datetime SN NO. Values data1 data2 data3 data4 data5 data6
2020-09-29T14:59:13.4461479 02:00 701 24.511 3.556 3.557 3.555 3.551 3.559 3.555
2020-09-29T15:48:04.6368679 02:00 702 24.516 3.554 3.555 3.555 3.556 3.552 3.557
2020-09-29T15:51:46.2555875 02:00 703 24.517 3.553 3.556 3.551 3.553 3.558 3.554
2020-10-01T12:51:59.2687665 02:00 704 24.519 3.552 3.557 3.556 3.559 3.557 3.557
2021-02-01T19:27:09.0472459 02:00 705 24.511 3.551 3.558 3.558 3.550 3.551 3.552
. . . . . . . . .
boxplot = df.reset_index().boxplot(column=['Values'], by = "Datetime", return_type=None)
from matplotlib.cbook import boxplot_stats
outliers = [y for stat in boxplot_stats(df['Values']) for y in stat['fliers']]
print(outliers)
boxplot.plot()
plt.show()

[sorry for inconvenience this picture was deleted]

as shown in the box plot, there is some outlier but I want to extract other data which is included in the row with that specific values. (by example: one outlier is 24.519 from the data frame but I also need other data such as SN no. and data1, data2, data3, and so on for specific values. what is the best way to do it?

CodePudding user response:

To get a DF with all the outliers:

df_outliers = df.loc[df['Values'].isin(outlier_values), :]

To get only one row:

df_outliers = df.loc[df['Values'].eq(single_value), :]

If you have multiple rows with the same Value it will find all of them.

To keep only some columns from the original df:

cols = ['data1', 'data2']
df_outliers = df.loc[df['Values'].isin(outlier_values), cols]
  • Related