Home > Software design >  Pandas - Creating Two Columns From One Column With Intermixed Values
Pandas - Creating Two Columns From One Column With Intermixed Values

Time:11-03

I have a dataframe with values like such (sourced from somewhere else - I can't change the source data, unfortunately).

Store/Date Sales
Store1 nan
10/15/21 6712
10/16/21 7108
10/17/21 4120
Store2 nan
10/15/21 5123
10/16/21 9012
10/17/21 4241

I would like to turn this into the following dataframe:

Store Date Sales
Store1 10/15/21 6712
Store1 10/16/21 7108
Store1 10/17/21 4120
Store2 10/15/21 5123
Store2 10/16/21 9012
Store2 10/17/21 4241

I should note, Stores have a unique string on the front (e.g. 'Store' in this example), so I have a list of all the stores separated from the dates already. (If you've got a better way of doing that part, too, that would be great.)

One approach I thought of is to loop through the whole DF, and have the iterator be the current store, and create a new dataframe with that, but I know looping is not preferred (and this data has millions of rows, so not fast).

Is there a good pandas way of doing this?

Thanks!

Here is some code to generate that example df:

import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['Store/Date','Sales'],
                  data = [['Store1',np.nan],
                          ['10/15/21',6712],
                          ['10/16/21',7108],
                          ['10/17/21',4120],
                          ['Store2',np.nan],
                          ['10/15/21',5123],
                          ['10/16/21',9012],
                          ['10/17/21',4241]])

stores = df[df['Store/Date'].str.contains('Store', na = False)]
stores = stores.iloc[:,0]

CodePudding user response:

Use DataFrame.insert for new column for first position with Series.where and forward filling missing values, then remove rows with invert mask in boolean indexing and last use rename:

m = df['Store/Date'].str.contains('Store', na = False)
df.insert(0, 'Store', df['Store/Date'].where(m).ffill())
df = df[~m].rename(columns={'Store/Date':'Date'})
print (df)
    Store      Date   Sales
1  Store1  10/15/21  6712.0
2  Store1  10/16/21  7108.0
3  Store1  10/17/21  4120.0
5  Store2  10/15/21  5123.0
6  Store2  10/16/21  9012.0
7  Store2  10/17/21  4241.0
  • Related