I have a dataset of time-series data with almost 100 stores. Time period is couple of years (but not for all stores - there are some older and newer ones).
Store_num | date | Sales |
---|---|---|
23 | 01Jan2012 | 125 |
23 | 01Feb2012 | 12 |
23 | 01Mar2012 | 388 |
... | ... | ... |
57 | 01Jan2013 | 456 |
57 | 01Feb2013 | 555 |
57 | 01Mar2013 | 545 |
57 | 01Apr2013 | 657 |
... | ... | ... |
To perform predictions of future sales (first approach with statsmodels) I decided to change that df to:
date | 23_sales | 57_sales |
---|---|---|
01Jan2012 | 125 | NaN |
01Feb2012 | 12 | NaN |
01Mar2012 | 388 | NaN |
... | ... | ... |
01Jan2013 | ... | 456 |
01Feb2013 | ... | 555 |
01Mar2013 | ... | 545 |
01Apr2013 | ... | 657 |
... | ... | ... |
I am able to add those columns one by one but I fail with creating some kind of loop. My approach for doing it one by one:
store_23 = df[df['Store_num'] == 23].copy()
store_23.set_index(store_23['date'], inplace = True)
store_23.drop(['Store', 'date'], axis = 1, inplace = True)
store_23.columns = ['23_sales']
I was thinking also of creating a list of sales names (column names):
df_list = df['Store_num'].unique()
y_list = [str(num) "_y" for num in df_list]
and than creating new df and adding data using groupby but I also failed in this approach.
Can anyone give me some suggestions ? Maybe my approach is completly wrong ?
CodePudding user response:
iiuc you want df.pivot:
df = pd.DataFrame(
{
'store_num':[0,0,0,1,1,1,2,2,2],
'date':[1,2,3]*3,
'sales':np.random.randint(0,10,9),
}
)
df.pivot(index='date', columns='store_num')