How to filter in pandas dataframe in datetime column-CodePudding

I got a dataset that I work on and it had datetime column that looked like this (2012-01-01 00:00:00; 2012-01-01 01:00:00...). So I have data of every hour but I need to make analysis based on a day. So I created another column of only the date. Dataframe is kinda like this:

               Datetime        Date  Airquality(float)
0   2012-01-01 00:00:00  2012-01-01             120.00
1   2012-01-01 01:00:00  2012-01-01             150.12
2   2012-01-01 02:00:00  2012-01-01             140.00
..
40  2012-01-02 00:00:00  2012-01-02             130.54
41  2012-01-02 01:00:00  2012-01-02             110.33
...
99  2012-12-31 23:00:00  2012-12-31             190.76

Now, I need to extract all the days where the pollution was higher than 150 for the year of 2012 (in the dataset there is data for more than one year) and if I group only by the Date column then I get much more info than needed cause there are duplicates of course.

CodePudding user response：

Given:

              Datetime        Date  Airquality(float)
0  2012-01-01 00:00:00  2012-01-01             120.00
1  2012-01-01 01:00:00  2012-01-01             150.12
2  2012-01-01 02:00:00  2012-01-01             140.00
3  2012-01-02 00:00:00  2012-01-02             130.54
4  2012-01-02 01:00:00  2012-01-02             110.33
5  2012-12-31 23:00:00  2012-12-31             190.76

Doing:

# We don't actually need this column.
df = df.drop('Date', axis=1)
# This may be already done but...
df.Datetime = pd.to_datetime(df.Datetime)
# Set it as your index.
df = df.set_index('Datetime')

# Filter to just 2012, resample to daily
# taking the max of 'Airquality(float)'
# and filter to only show values > 150.
output = (df.loc[df.index.year == 2012]
            .resample('1d')['Airquality(float)']
            .max()[lambda x: x.gt(150)])
print(output)

Output:

Datetime
2012-01-01    150.12
2012-12-31    190.76
Name: Airquality(float), dtype: float64

CodePudding user response：

pd.Grouper should help you here, you can group_by the days
df.groupby(pd.Grouper(key="Datetime", freq="1D")).max() and then use a normal filter.