Home > database >  Grouping of a dataframe monthly after calculating the highest daily values
Grouping of a dataframe monthly after calculating the highest daily values

Time:08-24

I've got a dataframe with two columns one is datetime dataframe consisting of dates, and another one consists of quantity. It looks like something like this,

       Date   Quantity
0   2019-01-05  10
1   2019-01-10  15
2   2019-01-22  14
3   2019-02-03  12
4   2019-05-11  25
5   2019-05-21  4
6   2019-07-08  1
7   2019-07-30  15
8   2019-09-05  31
9   2019-09-10  44
10  2019-09-25  8
11  2019-12-09  10
12  2020-04-11  111
13  2020-04-17  5
14  2020-06-05  17
15  2020-06-16  12
16  2020-06-22  14

I want to make another dataframe. It should consist of two columns one is Month/Year and the other is Till Highest. I basically want to calculate the highest quantity value until that month and group it using month/year. Example of what I want precisely is,

    Month/Year  Till Highest
0   Jan/2019    15
1   Feb/2019    15
2   May/2019    25
3   Jul/2019    25
4   Sep/2019    44
5   Dec/2019    44
6   Apr/2020    111
7   Jun/2020    111

In my case, the dataset is vast, and I've readings of almost every day of each month and each year in the specified timeline. Here I've made a dummy dataset to show an example of what I want.

Please help me with this. Thanks in advance :)

CodePudding user response:

See the annotated code:

(df
 # convert date to monthly period (2019-01)
 .assign(Date=pd.to_datetime(df['Date']).dt.to_period('M'))
 # period and max quantity per month
 .groupby('Date')
 .agg(**{'Month/Year': ('Date', 'first'),
         'Till highest': ('Quantity', 'max')})
 # format periods as Jan/2019 and get cumulated max quantity
 .assign(**{
     'Month/Year': lambda d: d['Month/Year'].dt.strftime('%b/%Y'),
     'Till highest': lambda d: d['Till highest'].cummax()
           })
 # drop the groupby index
 .reset_index(drop=True)
)

output:

  Month/Year  Till highest
0   Jan/2019            15
1   Feb/2019            15
2   May/2019            25
3   Jul/2019            25
4   Sep/2019            44
5   Dec/2019            44
6   Apr/2020           111
7   Jun/2020           111

CodePudding user response:

In R you can use cummax:

df=data.frame(Date=c("2019-01-05","2019-01-10","2019-01-22","2019-02-03","2019-05-11","2019-05-21","2019-07-08","2019-07-30","2019-09-05","2019-09-10","2019-09-25","2019-12-09","2020-04-11","2020-04-17","2020-06-05","2020-06-16","2020-06-22"),Quantity=c(10,15,14,12,25,4,1,15,31,44,8,10,111,5,17,12,14))

data.frame(`Month/Year`=unique(format(as.Date(df$Date),"%b/%Y")),
  `Till Highest`=cummax(tapply(df$Quantity,sub("-..$","",df$Date),max)),
  check.names=F,row.names=NULL)

  Month/Year Till Highest
1   Jan/2019           15
2   Feb/2019           15
3   May/2019           25
4   Jul/2019           25
5   Sep/2019           44
6   Dec/2019           44
7   Apr/2020          111
8   Jun/2020          111
  • Related