This is what my data looks like:
month total_mobile_subscription
0 1997-01 414000
1 1997-02 423000
2 1997-03 431000
3 1997-04 479000
4 1997-05 510000
.. ... ...
279 2020-04 9056300
280 2020-05 8928800
281 2020-06 8860000
282 2020-07 8768500
283 2020-08 8659000
[284 rows x 2 columns]
Basically, I'm trying to change this into a dataset sorted by year with the value being the mean average of total mobiles subscriptions for each year.
I am not sure what to do as I am still learning this.
CodePudding user response:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'year': ['1997-01', '1997-02', '1997-03', '1998-01', '1998-02', '1998-03'],
'sale': [500, 1000, 1500, 2000, 1000, 400]
})
for a in range(1,13):
x = '-0' str(a)
df['year'] = df['year'].str.replace(x, '')
df2 = (df.groupby(['year']).mean('sale'))
print(df2)
CodePudding user response:
Convert values of month
to datetimes and aggregate by years:
y = pd.to_datetime(df['month']).dt.year.rename('year')
df1 = df.groupby(y)['total_mobile_subscription'].mean().reset_index()
df['month'] = pd.to_datetime(df['month'])
df1 = (df.groupby(df['month'].dt.year.rename('year'))['total_mobile_subscription']
.mean().reset_index())
Or aggregate by first 4 values of month column:
df2 = (df.groupby(df['month'].str[:4].rename('year'))['total_mobile_subscription']
.mean().reset_index())