How do I transform a pandas daily dataframe into average of monthly maximums?-CodePudding

I have a dataframe showing daily rainfall between 1973 and 2013 that looks like this:

            tp1
time           
1973-04-01  0.1
1973-07-01  0.4
1973-08-01  0.0
1973-12-01  0.5
1973-01-17  0.0
        ...
2013-10-09  0.0
2013-11-09  0.2
2013-12-09  0.0
2013-09-13  0.4
2013-09-14  0.0

[6432 rows x 1 columns]

I'm trying to figure out the maximum daily rainfall in each month for every year, then take an average of each of the monthly values over the years , so the final datagram will contain only 12 rows be in the following format:

      Average of Maximum Daily Rainfall in each Month
Jan       x
Feb       x
March     x 
April     x
May       x
June      x
Jul       x
....

I've tried the following command:

data = df.groupby(df.index.month).max()

However, what I believe this is doing is plotting the Maximum value of maximums rather than the average of maximums which is what I want.

CodePudding user response：

You can first groupby year and month to get maximum in month then groupby month to get average along years.

out = df.groupby(df.index.strftime('%Y-%m')).max()
out.index = pd.to_datetime(out.index)

print(out)

            tp1
1973-01-01  0.0
1973-04-01  0.1
1973-07-01  0.4
1973-08-01  0.0
1973-12-01  0.5
2013-09-01  0.4
2013-10-01  0.0
2013-11-01  0.2
2013-12-01  0.0

res = out.groupby(out.index.strftime('%b')).mean()

print(res)

      tp1
Apr  0.10
Aug  0.00
Dec  0.25
Jan  0.00
Jul  0.40
Nov  0.20
Oct  0.00
Sep  0.40

CodePudding user response：

####reset the index

df = df.reset_index().rename(columns = {'index':'timestamp'})

####(optional) in case the timestamp is in str format (convert to datetime format)

df['timestamp'] = pd.to_datetime(df['timestamp'],format = '%Y-%m-%d %H:%M:%S')

####get month and year

from datetime import datetime

df['month'] = df['timestamp'].apply(lambda x:x.strftime('%b'))

df['year'] = df['timestamp'].apply(lambda x:x.strftime('%Y'))

####get the maximum rain for each month_year

df2 = df.groupby(['month','year']).max()['tp1'].reset_index()

####avg according to month

df2 = df2.groupby('month').mean()['tpi1'].reset_index()

####note, I didn't try the above code, but hope you can get some idea from it. TQ

CodePudding user response：

I think you want to groupby both year and month to take the max and then groupby month to take the mean:

df = pandas.DataFrame(
    {'tp1': [.1, .4, 0, .5, .0, .0, .2, .0, .4, .0]},
    index = pandas.to_datetime([
        '1973-04-01',
        '1973-07-01',
        '1973-08-01',
        '1973-12-01',
        '1973-01-17',
        '2013-07-09',
        '2013-08-09',
        '2013-12-09',
        '2013-12-13',
        '2013-01-14',
    ])
)
result = (
    df
    .groupby([df.index.year, df.index.month])['tp1']
    .max()
    .reset_index()
    .groupby('level_1')['tp1']
    .mean()
)
result.index = pandas.to_datetime(result.index, format='%m').strftime('%b')
print(result)

output

level_1
Jan    0.00
Apr    0.10
Jul    0.20
Aug    0.10
Dec    0.45
Name: tp1, dtype: float64

CodePudding user response：

converting the time to datetime, in order to operate on it with date functions.

Creating temporary variables for year and month, then using groupby to get max for year and month combination

df['time'] = pd.to_datetime(df['time'])


df.assign(yr=df['time'].dt.year, 
              mt=df['time'].dt.month,
              mth=df['time'].dt.strftime('%b')).groupby(['yr','mt','mth'])['tpl'].max().reset_index()

      yr    mth     tpl
0   1973    Jan     0.0
1   1973    Apr     0.1
2   1973    Jul     0.4
3   1973    Aug     0.0
4   1973    Dec     0.5
5   2013    Sep     0.4
6   2013    Oct     0.0
7   2013    Nov     0.2
8   2013    Dec     0.0