Home > OS >  Convert the values in dataframe to more usable format for Time series modeling
Convert the values in dataframe to more usable format for Time series modeling

Time:02-01

My dataframe has values of how many red cars are sold on a specific month. I have to build a predictive model to predict monthly sale Current Data frame

I want the current data frame to be converted into the format below for time series modeling.

Future Data Frame

How can I read the column and row header to create a date column? I am hoping for a new data frame.

CodePudding user response:

You can use melt() to transform the dataframe from the wide to the long format. Then we combine the Year and month information to make an actual date:

import pandas as pd

df = pd.DataFrame({'YEAR' : [2021,2022],
                  'JAN' : [5, 232],
                  'FEB':[545, 48]})

df2 = df.melt(id_vars = ['YEAR'], var_name = 'month', value_name = 'sales')

df2['date'] = df2.apply(lambda row: pd.to_datetime(str(row['YEAR'])   row['month'], format = '%Y%b'), axis = 1)
df2.sort_values('date')[['date', 'sales']]

this gives the output:


    date        sales
0   2021-01-01  5
2   2021-02-01  545
1   2022-01-01  232
3   2022-02-01  48

(for time series analysis you would probably want to set the date column as index)

  • Related