import numpy as np
import pandas as pd
import os
import xlsxwriter
original_data = pd.read_excel('purchase_record.xlsx')
original_data.head()
df = pd.DataFrame(original_data)
filter_data = df.set_index(['item', 'Region', 'City Code', 'Area', 'City Name',
'price']).apply(lambda x: x.str.split('/').explode()).reset_index()
now my first row has shop name that is marketA/marketB with the same price 100
i want to make it to two rows and the price divided by the number of market in shop name
Shop Name item Region City Code Area City Name price
MarketA/MarketB Apple A 10201 East B 100
MarketC Grape B 20010 West C 90
MarketD/MarketE Orange C 30132 North D 84
- MarketA Apple A 10201 East B 50
- MarketB Apple A 10201 East B 50
- MarketC Grape B 20010 West C 90
- MarketD Orange C 30132 North D 42
- MarketE Orange C 30132 North D 42
CodePudding user response:
Basing on numpy.where
(check condition for "jointed" shops) and pandas.DataFrame.explode
method:
cond = df['Shop Name'].str.contains('/')
df['price'] = np.where(cond, df['price'] / 2, df['price'])
df['Shop Name'] = np.where(cond, df['Shop Name'].str.split('/'), df['Shop Name'])
df = df.explode('Shop Name')
print(df)
The output:
Shop Name item Region City Code Area City Name price
0 MarketA Apple A 10201 East B 50.0
0 MarketB Apple A 10201 East B 50.0
1 MarketC Grape B 20010 West C 90.0
2 MarketD Orange C 30132 North D 42.0
2 MarketE Orange C 30132 North D 42.0