I've created the following datafram from data given on CDC link.
googledata = pd.read_csv('/content/data_table_for_daily_case_trends__the_united_states.csv', header=2)
# Inspect data
googledata.head()
id | State | Date | New Cases |
---|---|---|---|
0 | United States | Oct 2 2022 | 11553 |
1 | United States | Oct 1 2022 | 8024 |
2 | United States | Sep 30 2022 | 46383 |
3 | United States | Sep 29 2022 | 89873 |
4 | United States | Sep 28 2022 | 63763 |
After converting the date column to datetime and trimming the data for the last 1 year by implementing the mask operation I got the data in the last 1 year:
googledata['Date'] = pd.to_datetime(googledata['Date'])
df = googledata
start_date = '2021-10-1'
end_date = '2022-10-1'
mask = (df['Date'] > start_date) & (df['Date'] <= end_date)
df = df.loc[mask]
But the problem is I am getting the data in terms of days, but I wish to convert this data in terms of weeks ; i.e converting the 365 rows to 52 rows corresponding to weeks data taking mean of New cases
the 7 days in 1 week's data.
I tried implementing the following method as shown in the previous post: link I don't think I am even applying this correctly! Because this code is not asking me to put my dataframe anywhere!
logic = {'New Cases' : 'mean'}
offset = pd.offsets.timedelta(days=-6)
f = pd.read_clipboard(parse_dates=['Date'], index_col=['Date'])
f.resample('W', loffset=offset).apply(logic)
But I am getting the following error:
AttributeError: module 'pandas.tseries.offsets' has no attribute 'timedelta'
CodePudding user response:
If I'm understanding you want to resample
df = df.set_index("Date")
df.index = df.index - pd.tseries.frequencies.to_offset("6D")
df = df.resample("W").agg({"New Cases": "mean"}).reset_index()
CodePudding user response:
You can use strftime to convert date to week number before applying groupby
df['Week'] = df['Date'].dt.strftime('%Y-%U')
df.groupby('Week')['New Cases'].mean()