I have a dataframe with values like such (sourced from somewhere else - I can't change the source data, unfortunately).
Store/Date | Sales |
---|---|
Store1 | nan |
10/15/21 | 6712 |
10/16/21 | 7108 |
10/17/21 | 4120 |
Store2 | nan |
10/15/21 | 5123 |
10/16/21 | 9012 |
10/17/21 | 4241 |
I would like to turn this into the following dataframe:
Store | Date | Sales |
---|---|---|
Store1 | 10/15/21 | 6712 |
Store1 | 10/16/21 | 7108 |
Store1 | 10/17/21 | 4120 |
Store2 | 10/15/21 | 5123 |
Store2 | 10/16/21 | 9012 |
Store2 | 10/17/21 | 4241 |
I should note, Stores have a unique string on the front (e.g. 'Store' in this example), so I have a list of all the stores separated from the dates already. (If you've got a better way of doing that part, too, that would be great.)
One approach I thought of is to loop through the whole DF, and have the iterator be the current store, and create a new dataframe with that, but I know looping is not preferred (and this data has millions of rows, so not fast).
Is there a good pandas way of doing this?
Thanks!
Here is some code to generate that example df:
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['Store/Date','Sales'],
data = [['Store1',np.nan],
['10/15/21',6712],
['10/16/21',7108],
['10/17/21',4120],
['Store2',np.nan],
['10/15/21',5123],
['10/16/21',9012],
['10/17/21',4241]])
stores = df[df['Store/Date'].str.contains('Store', na = False)]
stores = stores.iloc[:,0]
CodePudding user response:
Use DataFrame.insert
for new column for first position with Series.where
and forward filling missing values, then remove rows with invert mask in boolean indexing
and last use rename
:
m = df['Store/Date'].str.contains('Store', na = False)
df.insert(0, 'Store', df['Store/Date'].where(m).ffill())
df = df[~m].rename(columns={'Store/Date':'Date'})
print (df)
Store Date Sales
1 Store1 10/15/21 6712.0
2 Store1 10/16/21 7108.0
3 Store1 10/17/21 4120.0
5 Store2 10/15/21 5123.0
6 Store2 10/16/21 9012.0
7 Store2 10/17/21 4241.0