I have data frame like this:
title price weight
0 Crloni Model145 $45,$50,$60 200gm,500gm,800gm
here $45 for 200gm, $50 for 500gm and $60 for 800 gm. My expected data frame will be look like this:
title price weight
0 Crloni Model145 $45 200gm
1 Crloni Model145 $50 500gm
2 Crloni Model145 $60 800gm
But now I getting data frame like this
title price weight
0 Crloni Model145 $45 200gm
1 Crloni Model145 $45 500gm
2 Crloni Model145 $45 800gm
3 Crloni Model145 $50 200gm
4 Crloni Model145 $50 500gm
5 Crloni Model145 $50 800gm
6 Crloni Model145 $45 200gm
7 Crloni Model145 $45 500gm
8 Crloni Model145 $45 800gm
9 Crloni Model145 $60 200gm
10 Crloni Model145 $60 500gm
11 Crloni Model145 $60 800gm
here is my code:
data['price']=data['price'].str.split(',')
data = data.explode('price')
data['weight']=data['weight'].str.split(',')
data = data.explode('weight')
updated question:
I applied Bill the Lizard solution. I am not getting any error but I am not seeing any price, weight in my csv when export the csv file.
data['price']=data['price'].str.split(',')
data['weight']=df['weight'].str.split(',')
data = data.set_index(['title']).apply(pd.Series.explode).reset_index()
data.to_csv('...mypath/data.csv')
here is BENY solution but I am getting cannot reindex from a duplicate axis
data['price'] = data['price'].str.split(',')
data['weight'] = data['weight'].str.split(',')
out = data.explode(['price','weight'])
data['description'] = data['description'].mask(data['description].shift() == data['description'])
#update2
Bill the Lizard solution worked but I am not understanding why BENY solution not working? why I am getting cannot reindex from a duplicate axis
this error when applying BENY solution
update3 few row of my original excel file
category title price weight description
Shirt men-shirt 20,25,35 100gm,50gm,150gm shirt description....
pant men-pent 40,35,90 200gm,350gm,150gm pant description....
CodePudding user response:
Update your pandas
and explode
now can accept two columns
df['price'] = df['price'].str.split(',')
df['weight'] = df['weight'].str.split(',')
out = df.explode(['price','weight'])
CodePudding user response:
If you have a Pandas version prior to 1.3.0, where multi-column explode was added:
Since the lists after splitting the strings have the same number of elements, you can apply Series.explode
to the price
and weight
columns to the the expected output.
import pandas as pd
df = pd.DataFrame({'title': ['Crloni Model145'],
'price': ['$45,$50,$60'],
'weight': ['200gm,500gm,800gm']})
df['price']=df['price'].str.split(',')
df['weight']=df['weight'].str.split(',')
df = df.set_index(['title']).apply(pd.Series.explode).reset_index()
print(df)
I set the index to title
because I don't want explode
to be applied to that column, then I reset the index at the end so title
becomes a regular column again.
Output:
title price weight
0 Crloni Model145 $45 200gm
1 Crloni Model145 $50 500gm
2 Crloni Model145 $60 800gm