Home > database >  Drop pandas rows based on percentage of valid data
Drop pandas rows based on percentage of valid data

Time:02-06

I have a pandas data frame that looks like this

Date_Time level
2018-02-12 13:22:27 5
2018-02-12 13:17:27 7
2018-02-12 13:12:27 2
2018-02-12 13:07:27 6
2018-02-13 13:12:27 4
2018-02-13 13:17:27 5

How do I make it so If there is less than 3 entries on a specific date they get removed i.e since 2018-03-13 < 4 entries remove them and get this table

Date_Time level
2018-02-12 13:22:27 5
2018-02-12 13:17:27 7
2018-02-12 13:12:27 2
2018-02-12 13:07:27 6

I tried using a for loop but that takes too long to run

CodePudding user response:

You can do groupby and transform with count and then use ge to get the rows you wanted:

df[df.groupby(df['Date_Time'].dt.date)['Date_Time'].transform('count').ge(4)]
  • Related