Home > Enterprise >  How to create multiple columns from one column using values from another column?
How to create multiple columns from one column using values from another column?

Time:11-29

I have a dataset of time-series data with almost 100 stores. Time period is couple of years (but not for all stores - there are some older and newer ones).

Store_num date Sales
23 01Jan2012 125
23 01Feb2012 12
23 01Mar2012 388
... ... ...
57 01Jan2013 456
57 01Feb2013 555
57 01Mar2013 545
57 01Apr2013 657
... ... ...

To perform predictions of future sales (first approach with statsmodels) I decided to change that df to:

date 23_sales 57_sales
01Jan2012 125 NaN
01Feb2012 12 NaN
01Mar2012 388 NaN
... ... ...
01Jan2013 ... 456
01Feb2013 ... 555
01Mar2013 ... 545
01Apr2013 ... 657
... ... ...

I am able to add those columns one by one but I fail with creating some kind of loop. My approach for doing it one by one:

store_23 = df[df['Store_num'] == 23].copy()
store_23.set_index(store_23['date'], inplace = True)
store_23.drop(['Store', 'date'], axis = 1, inplace = True)
store_23.columns = ['23_sales']

I was thinking also of creating a list of sales names (column names):

df_list = df['Store_num'].unique()
y_list = [str(num)   "_y" for num in df_list]

and than creating new df and adding data using groupby but I also failed in this approach.

Can anyone give me some suggestions ? Maybe my approach is completly wrong ?

CodePudding user response:

iiuc you want df.pivot:

df = pd.DataFrame(
    {
        'store_num':[0,0,0,1,1,1,2,2,2],
        'date':[1,2,3]*3,
        'sales':np.random.randint(0,10,9),
    }
)

df.pivot(index='date', columns='store_num')
  • Related