Home > OS >  Split date index to yearly index and monthly columns
Split date index to yearly index and monthly columns

Time:10-21

I have a DataFrame like this:

data_date   value
2016-01-01  1
2016-01-02  2
2017-02-05  3
2017-02-07  4
2017-03-09  5

I need to convert it to a table with years as the index and months as the columns. (aggregate using sum)

The final output should look like this

      Jan | Feb | Mar | Apr | ........... Dec |
2016  3   | xx  | xx  | xx  | ............    |
2017  xx  | 7   | 5   | xx  | ............    |

Here it what I did:

To make it easy to copy:

import pandas as pd
df=pd.DataFrame([
 {'data_date': '2016-01-01', 'value': 1},
 {'data_date': '2016-01-02', 'value': 2},
 {'data_date': '2017-02-05', 'value': 3},
 {'data_date': '2017-02-07', 'value': 4},
 {'data_date': '2017-03-09', 'value': 5}])

I used TimeGrouper to first aggregate it to monthly like so:

df['data_date'] = pd.to_datetime(df['data_date'])
df.set_index('data_date', inplace=True)
grp_df = df.groupby([pd.Grouper(freq='M')]).sum()

So now I have data aggregated to each row as a month/Yr. I'm stuck as to how to make months the columns and year the row.

Can you help me with that please?

CodePudding user response:

Try pivot_table:

(df.assign(year=df.data_date.dt.year, month=df.data_date.dt.strftime('%b'))
   .pivot_table(index='year', columns='month', values='value', aggfunc='sum')
   .reindex(['Jan','Feb','Mar','Dec'], axis=1)   # change this to correct month names
)

Or with pd.crosstab:

pd.crosstab(
    index=df.data_date.dt.year,
    columns=df.data_date.dt.strftime('%b'),
    values=df['value'],
    aggfunc='sum'
).reindex(['Jan','Feb','Mar','Dec'], axis=1)

Output:

month  Jan  Feb  Mar  Dec
year                     
2016   3.0  NaN  NaN  NaN
2017   NaN  7.0  5.0  NaN

CodePudding user response:

You can use:

df['data_date'] = pd.to_datetime(df['data_date'])
df['year'] = df['data_date'].dt.year
df['month'] = df['data_date'].dt.month_name().str[:3]
df = df.pivot_table(index='year', columns='month', values='value', aggfunc='sum')

CodePudding user response:

If you need all the months of the year, a solution with resample might be helpful. There's no point doing this if you do not need it :

df['data_date'] = pd.to_datetime(df['data_date'])
grp = df.set_index('data_date')
grp = grp.resample('M').sum().reset_index()
grp = grp.assign(year = grp.data_date.dt.year, 
                 month = grp.data_date.dt.month_name().str[:3])
grp['month'] = grp['month'].astype(pd.CategoricalDtype(grp.month.unique(),
                                                       ordered=True))
grp.pivot('year', 'month', 'value')

month  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
year
2016   3.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
2017   0.0  7.0  5.0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
  • Related