Home > Blockchain >  How can I split DataFrame into two different columns?
How can I split DataFrame into two different columns?

Time:06-07

I have below CSV file

timestamp,store,customer_name,basket_items,total_price,cash_or_card,card_number
06/06/2022 09:00,Chesterfield,Stephanie Neyhart,"Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",8.95,CASH,

I want to split basket_items like

product                                   price 
Large Flat white                          2.45
Large Flavoured iced latte - Vanilla      3.25
Large Flavoured iced latte - Hazelnut     3.25

How can I do that with pandas dataframe?

CodePudding user response:

Try this

#data
df = pd.DataFrame([{'timestamp': '06/06/2022 09:00',
                    'store': 'Chesterfield',
                    'customer_name': 'Stephanie Neyhart',
                    'basket_items': "Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",
                    'total_price': 8.95,
                    'cash_or_card': 'CASH'}])
# split by comma and explode (to separate products into multi-rows)
# split by dash once from the right side to separate product from price
res = df.basket_items.str.split(', ').explode().str.rsplit(' - ', n=1, expand=True)
# set column names
res.columns = ['product', 'price']
res

enter image description here

CodePudding user response:

import pandas as pd

df = pd.read_csv('df.csv')

def extract_prods(row):
    return [
        {key:val.strip() for key, val in zip(['product', 'price'], prod.rsplit('-', 1))}
        for prod in row.split(', ')
    ]

pd.DataFrame(sum(df['basket_items'].apply(extract_prods), []))
product price
Large Flat white 2.45
Large Flavoured iced latte - Vanilla 3.25
Large Flavoured iced latte - Hazelnut 3.25
Large Flat white 2.45
Large Flavoured iced latte - Vanilla 3.25
Large Flavoured iced latte - Hazelnut 3.25
  • Related