(https://i.stack.imgur.com/58D8K.png)
df = pd.read_csv('1410001701eng.csv')
df.head()
df['date'] = pd.to_datetime(df['Age group'])
df['year'] = pd.DatetimeIndex(df['date']).year
monthly_year_avg = df.groupby('year')['VALUE'].mean()
print(monthly_year_avg)
This is my code. Could you please tell me or give me a hint or show me the website has similar questions. I have monthly data from Jan-1978 to November-2022. How can I convert all these monthly data from different age groups to annually by taking average?
or do you think I should calculate it one by one is Excel? Cause it only 44 years.
Thank you very much! Much appreciated
I tried search similar questions in reddit forum and Stack overflow, they all used rsample and get the result.
I have monthly data from Jan-1978 to November-2022. How can I convert all these monthly data from different age groups to annually by taking average?
CodePudding user response:
This should give you a new pandas dataframe with the yearly mean. Note that the if statement has a subtract by 1 on the timestep to account for no December column for 2022.
new_df = pd.DataFrame() #create empty pandas dataframe
time_step = 12 #years
for i in np.arange(0, len(df.columns), time_step):
new_header = df.columns[i][-2:]
if new_header == str(22): #If the year is 2022
sliced_for_mean = df.iloc[:, i:i time_step-1] #take one off from the last step (no December column)
new_df[new_header] = sliced_for_mean.mean(axis=1) #means for each row appended to new_df
else: #else do this
sliced_for_mean = df.iloc[:, i:i time_step] #sliced df to calculate mean for year
new_df[new_header] = sliced_for_mean.mean(axis=1) #means for each row appended to new_df
print(new_df)
CodePudding user response:
melt month columns to a single column "month", and extract year value from month. Then aggregate by year:
df = pd.DataFrame(data=[
["M", "21-30", None, None, 15000, 21000, 22500, 21800, None, None, None],
["M", "31-40", 18000, 19200, 19000, None, None, 21800, 21500, 22300, 22000],
["M", "41-50", 22200, None, 15000, 21000, 22500, 21800, None, None, 22000],
], columns=["gender", "age_group", "Nov-20", "Dec-20", "Mar-21", "Apr-21", "May-21", "Jun-21", "Jan-22", "Feb-22", "Mar-22"])
df = df.fillna(0)
df = df.melt(id_vars=["gender", "age_group"], value_vars=df.drop(["gender", "age_group"], axis=1).columns, var_name="month", value_name="value")
df["year"] = df["month"].str.split("-").str[1]
df = df.groupby(["gender", "age_group", "year"]).agg(avg=("value", np.mean)).reset_index()
[Out]
gender age_group year avg
0 M 21-30 20 0.000000
1 M 21-30 21 20075.000000
2 M 21-30 22 0.000000
3 M 31-40 20 18600.000000
4 M 31-40 21 10200.000000
5 M 31-40 22 21933.333333
6 M 41-50 20 11100.000000
7 M 41-50 21 20075.000000
8 M 41-50 22 7333.333333