How to separate column in dataframe pandas-CodePudding

I have DataFrame containing values about shops and categories in one column.

Date	Spent	...	Category/Shop
2022-08-04	126.98	...	Supermarkets
2022-08-04	NaN	...	ShopName
2022-08-04	119.70	...	Supermarkets
2022-08-04	NaN	...	ShopName

...

I need to separate last column into to columns:

Date	Spent	...	Category	Shop
2022-08-04	126.98	...	Supermarkets	ShopName
2022-08-04	119.70	...	Supermarkets	ShopName

How can this be done?

We can assume that every second row in the Category/Shop column contains the name of the store that needs to be moved to a new column.

CodePudding user response：

Based on the sample and expecting future similar behavior I would do it with groupby

df = df.fillna(method='ffill').groupby(['Date','Spent'])['Category/Shop'].apply(list).reset_index()
df['Category'],df['Shop'] = df['Category/Shop'].str[0],df['Category/Shop'].str[1]
df = df.drop(columns='Category/Shop')

Outputting:

         Date   Spent      Category      Shop
0  2022-08-04  119.70  Supermarkets  ShopName
1  2022-08-04  126.98  Supermarkets  ShopName

CodePudding user response：

Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

CodePudding user response：

I would go for iloc to retrieve every second row, and build a new dataframe with pd.concat.

Given df is your source DataFrame, it would look like this:

pd.concat(
    [
        # start at the first row and use every 2nd row of df
        df.iloc[::2].reset_index(drop=True),
        # start at the second row and use every 2nd row of df, but only the last column
        df.iloc[1::2]["category/shop"].reset_index(drop=True)
    ],
    # concatenate along columns
    axis=1
)

Output:

         date   spent category/shop category/shop
0  2022-08-04  126.98  Supermarkets      ShopName
1  2022-08-04  119.70  Supermarkets      ShopName