I have a dataframe showing daily rainfall between 1973 and 2013 that looks like this:
tp1
time
1973-04-01 0.1
1973-07-01 0.4
1973-08-01 0.0
1973-12-01 0.5
1973-01-17 0.0
...
2013-10-09 0.0
2013-11-09 0.2
2013-12-09 0.0
2013-09-13 0.4
2013-09-14 0.0
[6432 rows x 1 columns]
I'm trying to figure out the maximum daily rainfall in each month for every year, then take an average of each of the monthly values over the years , so the final datagram will contain only 12 rows be in the following format:
Average of Maximum Daily Rainfall in each Month
Jan x
Feb x
March x
April x
May x
June x
Jul x
....
I've tried the following command:
data = df.groupby(df.index.month).max()
However, what I believe this is doing is plotting the Maximum value of maximums rather than the average of maximums which is what I want.
CodePudding user response:
You can first groupby year
and month
to get maximum in month then groupby month
to get average along years.
out = df.groupby(df.index.strftime('%Y-%m')).max()
out.index = pd.to_datetime(out.index)
print(out)
tp1
1973-01-01 0.0
1973-04-01 0.1
1973-07-01 0.4
1973-08-01 0.0
1973-12-01 0.5
2013-09-01 0.4
2013-10-01 0.0
2013-11-01 0.2
2013-12-01 0.0
res = out.groupby(out.index.strftime('%b')).mean()
print(res)
tp1
Apr 0.10
Aug 0.00
Dec 0.25
Jan 0.00
Jul 0.40
Nov 0.20
Oct 0.00
Sep 0.40
CodePudding user response:
####reset the index
df = df.reset_index().rename(columns = {'index':'timestamp'})
####(optional) in case the timestamp is in str format (convert to datetime format)
df['timestamp'] = pd.to_datetime(df['timestamp'],format = '%Y-%m-%d %H:%M:%S')
####get month and year
from datetime import datetime
df['month'] = df['timestamp'].apply(lambda x:x.strftime('%b'))
df['year'] = df['timestamp'].apply(lambda x:x.strftime('%Y'))
####get the maximum rain for each month_year
df2 = df.groupby(['month','year']).max()['tp1'].reset_index()
####avg according to month
df2 = df2.groupby('month').mean()['tpi1'].reset_index()
####note, I didn't try the above code, but hope you can get some idea from it. TQ
CodePudding user response:
I think you want to groupby both year and month to take the max and then groupby month to take the mean:
df = pandas.DataFrame(
{'tp1': [.1, .4, 0, .5, .0, .0, .2, .0, .4, .0]},
index = pandas.to_datetime([
'1973-04-01',
'1973-07-01',
'1973-08-01',
'1973-12-01',
'1973-01-17',
'2013-07-09',
'2013-08-09',
'2013-12-09',
'2013-12-13',
'2013-01-14',
])
)
result = (
df
.groupby([df.index.year, df.index.month])['tp1']
.max()
.reset_index()
.groupby('level_1')['tp1']
.mean()
)
result.index = pandas.to_datetime(result.index, format='%m').strftime('%b')
print(result)
output
level_1
Jan 0.00
Apr 0.10
Jul 0.20
Aug 0.10
Dec 0.45
Name: tp1, dtype: float64
CodePudding user response:
converting the time to datetime, in order to operate on it with date functions.
Creating temporary variables for year and month, then using groupby to get max for year and month combination
df['time'] = pd.to_datetime(df['time'])
df.assign(yr=df['time'].dt.year,
mt=df['time'].dt.month,
mth=df['time'].dt.strftime('%b')).groupby(['yr','mt','mth'])['tpl'].max().reset_index()
yr mth tpl
0 1973 Jan 0.0
1 1973 Apr 0.1
2 1973 Jul 0.4
3 1973 Aug 0.0
4 1973 Dec 0.5
5 2013 Sep 0.4
6 2013 Oct 0.0
7 2013 Nov 0.2
8 2013 Dec 0.0