How to get calendar years as column names and month and day as index for one timeseries-CodePudding

I have looked for solutions but seem to find none that point me in the right direction, hopefully someone on here can help. I have a stock price data set, with a frequency of Month Start. I am trying get an output where the calendar years are the column names, and the day and month will be the index (there will only be 12 rows since it is monthly data). The rows will be filled with the stock prices corresponding to the year and month. I unforunately have no code since I have looked at for loops, groupby, etc but can seem to figure this one out.

CodePudding user response：

You might want to split the date into month and year and to apply a pivot:

s = pd.to_datetime(df.index)

out = (df
 .assign(year=s.year, month=s.month)
 .pivot_table(index='month', columns='year', values='Close', fill_value=0)
)

output:

year   2003  2004
month            
1         0     2
2         0     3
3         0     4
12        1     0

Used input:

df = pd.DataFrame({'Close': [1,2,3,4]},
                  index=['2003-12-01', '2004-01-01', '2004-02-01', '2004-03-01'])

CodePudding user response：

You need multiple steps to do that.

First split your column into the right format. Then convert this column into two separate columns. Then pivot the table accordingly.

import pandas as pd

# Test Dataframe
df = pd.DataFrame({'Date': ['2003-12-01', '2004-01-01', '2004-02-01', '2004-12-01'],
                   'Close': [6.661, 7.053, 6.625, 8.999]})

# Split datestring into list of form [year, month-day]
df = df.assign(Date=df.Date.str.split(pat='-', n=1))
# Separate date-list column into two columns
df = pd.DataFrame(df.Date.to_list(), columns=['Year', 'Date'], index=df.index).join(df.Close)
# Pivot the table
df = df.pivot(columns='Year', index='Date')
df

Output:

       Close       
Year    2003   2004
Date               
01-01    NaN  7.053
02-01    NaN  6.625
12-01  6.661  8.999