I have below CSV file
timestamp,store,customer_name,basket_items,total_price,cash_or_card,card_number
06/06/2022 09:00,Chesterfield,Stephanie Neyhart,"Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",8.95,CASH,
I want to split basket_items
like
product price
Large Flat white 2.45
Large Flavoured iced latte - Vanilla 3.25
Large Flavoured iced latte - Hazelnut 3.25
How can I do that with pandas dataframe?
CodePudding user response:
Try this
#data
df = pd.DataFrame([{'timestamp': '06/06/2022 09:00',
'store': 'Chesterfield',
'customer_name': 'Stephanie Neyhart',
'basket_items': "Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",
'total_price': 8.95,
'cash_or_card': 'CASH'}])
# split by comma and explode (to separate products into multi-rows)
# split by dash once from the right side to separate product from price
res = df.basket_items.str.split(', ').explode().str.rsplit(' - ', n=1, expand=True)
# set column names
res.columns = ['product', 'price']
res
CodePudding user response:
import pandas as pd
df = pd.read_csv('df.csv')
def extract_prods(row):
return [
{key:val.strip() for key, val in zip(['product', 'price'], prod.rsplit('-', 1))}
for prod in row.split(', ')
]
pd.DataFrame(sum(df['basket_items'].apply(extract_prods), []))
product | price |
---|---|
Large Flat white | 2.45 |
Large Flavoured iced latte - Vanilla | 3.25 |
Large Flavoured iced latte - Hazelnut | 3.25 |
Large Flat white | 2.45 |
Large Flavoured iced latte - Vanilla | 3.25 |
Large Flavoured iced latte - Hazelnut | 3.25 |