I got a dataset that I work on and it had datetime column that looked like this (2012-01-01 00:00:00; 2012-01-01 01:00:00...). So I have data of every hour but I need to make analysis based on a day. So I created another column of only the date. Dataframe is kinda like this:
Datetime Date Airquality(float)
0 2012-01-01 00:00:00 2012-01-01 120.00
1 2012-01-01 01:00:00 2012-01-01 150.12
2 2012-01-01 02:00:00 2012-01-01 140.00
..
40 2012-01-02 00:00:00 2012-01-02 130.54
41 2012-01-02 01:00:00 2012-01-02 110.33
...
99 2012-12-31 23:00:00 2012-12-31 190.76
Now, I need to extract all the days where the pollution was higher than 150 for the year of 2012 (in the dataset there is data for more than one year) and if I group only by the Date column then I get much more info than needed cause there are duplicates of course.
CodePudding user response:
Given:
Datetime Date Airquality(float)
0 2012-01-01 00:00:00 2012-01-01 120.00
1 2012-01-01 01:00:00 2012-01-01 150.12
2 2012-01-01 02:00:00 2012-01-01 140.00
3 2012-01-02 00:00:00 2012-01-02 130.54
4 2012-01-02 01:00:00 2012-01-02 110.33
5 2012-12-31 23:00:00 2012-12-31 190.76
Doing:
# We don't actually need this column.
df = df.drop('Date', axis=1)
# This may be already done but...
df.Datetime = pd.to_datetime(df.Datetime)
# Set it as your index.
df = df.set_index('Datetime')
# Filter to just 2012, resample to daily
# taking the max of 'Airquality(float)'
# and filter to only show values > 150.
output = (df.loc[df.index.year == 2012]
.resample('1d')['Airquality(float)']
.max()[lambda x: x.gt(150)])
print(output)
Output:
Datetime
2012-01-01 150.12
2012-12-31 190.76
Name: Airquality(float), dtype: float64
CodePudding user response:
pd.Grouper
should help you here, you can group_by the days
df.groupby(pd.Grouper(key="Datetime", freq="1D")).max()
and then use a normal filter.