There are segments of readings that have faulty data and i want to remove entire days which have a least one. I already created the column with the True and False if that segment is wrong.
Example of the dataframe below, since it have more than 100k rows
power_c power_g temperature to_delete
date_time
2019-01-01 00:00:00 00:00 2985 0 10.1 False
2019-01-01 00:05:00 00:00 2258 0 10.1 True
2019-01-01 01:00:00 00:00 2266 0 10.1 False
2019-01-02 00:15:00 00:00 3016 0 10.0 False
2019-01-03 01:20:00 00:00 2265 0 10.0 True
For example the first and second row belong to the same hour on the same day, one of the values has True so i want to delete all rows of that day.
Data always exists in diferences of 5 mins, so i tried to delete 288 items after the True, but since the error is not on the start of the hour it does work as intended.
I am very new to programming and tried a lot of different answers everywhere, i would apreciate very much any help.
CodePudding user response:
Group by the date, then filter out groups that have at least one to_delete
.
(df
.groupby(df.index.date)
.apply(lambda sf: None if sf['to_delete'].any() else sf)
.reset_index(level=0, drop=True))
power_c power_g temperature to_delete
date_time
2019-01-02 00:15:00 00:00 3016 0 10.0 False
I'm assuming date_time
is a datetime type. If not, convert it first:
df.index = pd.to_datetime(df.index)