this is the my pandas data frame:
Datetime | SN NO. | Values | data1 | data2 | data3 | data4 | data5 | data6 |
---|---|---|---|---|---|---|---|---|
2020-09-29T14:59:13.4461479 02:00 | 701 | 24.511 | 3.556 | 3.557 | 3.555 | 3.551 | 3.559 | 3.555 |
2020-09-29T15:48:04.6368679 02:00 | 702 | 24.516 | 3.554 | 3.555 | 3.555 | 3.556 | 3.552 | 3.557 |
2020-09-29T15:51:46.2555875 02:00 | 703 | 24.517 | 3.553 | 3.556 | 3.551 | 3.553 | 3.558 | 3.554 |
2020-10-01T12:51:59.2687665 02:00 | 704 | 24.519 | 3.552 | 3.557 | 3.556 | 3.559 | 3.557 | 3.557 |
2021-02-01T19:27:09.0472459 02:00 | 705 | 24.511 | 3.551 | 3.558 | 3.558 | 3.550 | 3.551 | 3.552 |
. | . | . | . | . | . | . | . | . |
boxplot = df.reset_index().boxplot(column=['Values'], by = "Datetime", return_type=None)
from matplotlib.cbook import boxplot_stats
outliers = [y for stat in boxplot_stats(df['Values']) for y in stat['fliers']]
print(outliers)
boxplot.plot()
plt.show()
[sorry for inconvenience this picture was deleted]
as shown in the box plot, there is some outlier but I want to extract other data which is included in the row with that specific values. (by example: one outlier is 24.519 from the data frame but I also need other data such as SN no. and data1, data2, data3, and so on for specific values. what is the best way to do it?
CodePudding user response:
To get a DF with all the outliers:
df_outliers = df.loc[df['Values'].isin(outlier_values), :]
To get only one row:
df_outliers = df.loc[df['Values'].eq(single_value), :]
If you have multiple rows with the same Value it will find all of them.
To keep only some columns from the original df:
cols = ['data1', 'data2']
df_outliers = df.loc[df['Values'].isin(outlier_values), cols]