My dataframe has values of how many red cars are sold on a specific month. I have to build a predictive model to predict monthly sale
I want the current data frame to be converted into the format below for time series modeling.
How can I read the column and row header to create a date column? I am hoping for a new data frame.
CodePudding user response:
You can use melt()
to transform the dataframe from the wide to the long format. Then we combine the Year and month information to make an actual date:
import pandas as pd
df = pd.DataFrame({'YEAR' : [2021,2022],
'JAN' : [5, 232],
'FEB':[545, 48]})
df2 = df.melt(id_vars = ['YEAR'], var_name = 'month', value_name = 'sales')
df2['date'] = df2.apply(lambda row: pd.to_datetime(str(row['YEAR']) row['month'], format = '%Y%b'), axis = 1)
df2.sort_values('date')[['date', 'sales']]
this gives the output:
date sales
0 2021-01-01 5
2 2021-02-01 545
1 2022-01-01 232
3 2022-02-01 48
(for time series analysis you would probably want to set the date column as index)