Home > Software engineering >  How to do Resampling of panel data from daily to monthly with sums and averages?
How to do Resampling of panel data from daily to monthly with sums and averages?

Time:05-06

I am working with a COVID-19 dataset that looks as follows:

Date City City ID State Estimated Population Estimated Population_2019 Confirmed Rate Death Rate New Confirmed New Deaths
2020-03-17 Rio Branco 10002 AC 413418 407319 0.72566 0.01 3 0
2020-03-17 Manaus 12330 AM 555550 555964 0.65433 0.005 5 3

The date is my index. I have multiple cities with equal dates as seen.

Given that I have daily datapoints, I am trying to resample my data such that I have monthly points. I have tried using the resample command but I am having trouble because I want some of my columns to be the same and sum and some to be the mean. More specifically:

  • City,City ID, State: Will remain the same as they are IDs
  • Estimated Population and Estimated_population: I would like to take the mean for each of these columns, and these will be the new monthly values
  • Confirmed Rate and Death: I would like to take the monthly mean of these and have these values to be my monthly values and I would like to create new columns giving the monthly standard deviation for my confirmed rate and death rate.
  • For New Confirmed and New Deaths: I would like to add these values and have my monthly point to be the sum of new cases and deaths, on two separate columns.

How can I go about making a code that is able to differentiate which columns to add, which to take the mean, and how can I create two new columns for the standard deviations of Confirmed and Death Rates?

CodePudding user response:

You should explore a combination of groupby with .agg. Something like this should work

df_grouped=df.groupby([df.index.month,'City ID']).agg({'Estimated Population':'mean','Estimated Population_2019':'mean','Confirmed Rate':['mean','std'],'Death Rate':['mean','std'],'New Confirmed':'sum','New Deaths':'sum'})

df_grouped.index.rename(['Month','City ID'],inplace=True)
  • Related