I have a DataFrame like this:
data_date value
2016-01-01 1
2016-01-02 2
2017-02-05 3
2017-02-07 4
2017-03-09 5
I need to convert it to a table with years as the index and months as the columns. (aggregate using sum)
The final output should look like this
Jan | Feb | Mar | Apr | ........... Dec |
2016 3 | xx | xx | xx | ............ |
2017 xx | 7 | 5 | xx | ............ |
Here it what I did:
To make it easy to copy:
import pandas as pd
df=pd.DataFrame([
{'data_date': '2016-01-01', 'value': 1},
{'data_date': '2016-01-02', 'value': 2},
{'data_date': '2017-02-05', 'value': 3},
{'data_date': '2017-02-07', 'value': 4},
{'data_date': '2017-03-09', 'value': 5}])
I used TimeGrouper to first aggregate it to monthly like so:
df['data_date'] = pd.to_datetime(df['data_date'])
df.set_index('data_date', inplace=True)
grp_df = df.groupby([pd.Grouper(freq='M')]).sum()
So now I have data aggregated to each row as a month/Yr. I'm stuck as to how to make months the columns and year the row.
Can you help me with that please?
CodePudding user response:
Try pivot_table:
(df.assign(year=df.data_date.dt.year, month=df.data_date.dt.strftime('%b'))
.pivot_table(index='year', columns='month', values='value', aggfunc='sum')
.reindex(['Jan','Feb','Mar','Dec'], axis=1) # change this to correct month names
)
Or with pd.crosstab
:
pd.crosstab(
index=df.data_date.dt.year,
columns=df.data_date.dt.strftime('%b'),
values=df['value'],
aggfunc='sum'
).reindex(['Jan','Feb','Mar','Dec'], axis=1)
Output:
month Jan Feb Mar Dec
year
2016 3.0 NaN NaN NaN
2017 NaN 7.0 5.0 NaN
CodePudding user response:
You can use:
df['data_date'] = pd.to_datetime(df['data_date'])
df['year'] = df['data_date'].dt.year
df['month'] = df['data_date'].dt.month_name().str[:3]
df = df.pivot_table(index='year', columns='month', values='value', aggfunc='sum')
CodePudding user response:
If you need all the months of the year, a solution with resample
might be helpful. There's no point doing this if you do not need it :
df['data_date'] = pd.to_datetime(df['data_date'])
grp = df.set_index('data_date')
grp = grp.resample('M').sum().reset_index()
grp = grp.assign(year = grp.data_date.dt.year,
month = grp.data_date.dt.month_name().str[:3])
grp['month'] = grp['month'].astype(pd.CategoricalDtype(grp.month.unique(),
ordered=True))
grp.pivot('year', 'month', 'value')
month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
year
2016 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2017 0.0 7.0 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN