Home > Software engineering >  Convert daily data to weekly by taking average of the 7 days
Convert daily data to weekly by taking average of the 7 days

Time:10-04

I've created the following datafram from data given on CDC link.

googledata = pd.read_csv('/content/data_table_for_daily_case_trends__the_united_states.csv', header=2) 
# Inspect data
googledata.head()
id State Date New Cases
0 United States Oct 2 2022 11553
1 United States Oct 1 2022 8024
2 United States Sep 30 2022 46383
3 United States Sep 29 2022 89873
4 United States Sep 28 2022 63763

After converting the date column to datetime and trimming the data for the last 1 year by implementing the mask operation I got the data in the last 1 year:

googledata['Date'] = pd.to_datetime(googledata['Date'])

df = googledata
start_date = '2021-10-1'
end_date = '2022-10-1'
mask = (df['Date'] > start_date) & (df['Date'] <= end_date)
  
df = df.loc[mask]

But the problem is I am getting the data in terms of days, but I wish to convert this data in terms of weeks ; i.e converting the 365 rows to 52 rows corresponding to weeks data taking mean of New cases the 7 days in 1 week's data.

I tried implementing the following method as shown in the previous post: link I don't think I am even applying this correctly! Because this code is not asking me to put my dataframe anywhere!

logic = {'New Cases'  : 'mean'}

offset = pd.offsets.timedelta(days=-6)

f = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
f.resample('W', loffset=offset).apply(logic)

But I am getting the following error:

AttributeError: module 'pandas.tseries.offsets' has no attribute 'timedelta'

CodePudding user response:

If I'm understanding you want to resample

df = df.set_index("Date")
df.index = df.index - pd.tseries.frequencies.to_offset("6D")
df = df.resample("W").agg({"New Cases": "mean"}).reset_index()

CodePudding user response:

You can use strftime to convert date to week number before applying groupby

df['Week'] = df['Date'].dt.strftime('%Y-%U')
df.groupby('Week')['New Cases'].mean()
  • Related