Home > Mobile >  pandas explode functionality not working properly
pandas explode functionality not working properly

Time:09-30

I have data frame like this:

     title            price        weight
   0 Crloni Model145  $45,$50,$60  200gm,500gm,800gm

here $45 for 200gm, $50 for 500gm and $60 for 800 gm. My expected data frame will be look like this:

    title              price      weight
    0 Crloni Model145  $45        200gm 
    1 Crloni Model145  $50        500gm
    2 Crloni Model145  $60        800gm
   

But now I getting data frame like this

       title           price      weight
    0 Crloni Model145  $45        200gm 
    1 Crloni Model145  $45        500gm
    2 Crloni Model145  $45        800gm
    3 Crloni Model145  $50        200gm 
    4 Crloni Model145  $50        500gm
    5 Crloni Model145  $50        800gm
    6 Crloni Model145  $45        200gm 
    7 Crloni Model145  $45        500gm
    8 Crloni Model145  $45        800gm
    9 Crloni Model145  $60        200gm 
    10 Crloni Model145  $60        500gm
    11 Crloni Model145  $60        800gm

here is my code:

data['price']=data['price'].str.split(',')
data = data.explode('price')
data['weight']=data['weight'].str.split(',')
data = data.explode('weight')

updated question:

I applied Bill the Lizard solution. I am not getting any error but I am not seeing any price, weight in my csv when export the csv file.

data['price']=data['price'].str.split(',')
data['weight']=df['weight'].str.split(',')
data = data.set_index(['title']).apply(pd.Series.explode).reset_index()

data.to_csv('...mypath/data.csv')

see the pic: enter image description here

here is BENY solution but I am getting cannot reindex from a duplicate axis

data['price'] = data['price'].str.split(',')
data['weight'] = data['weight'].str.split(',')
out = data.explode(['price','weight'])
data['description'] = data['description'].mask(data['description].shift() == data['description'])

#update2

Bill the Lizard solution worked but I am not understanding why BENY solution not working? why I am getting cannot reindex from a duplicate axis this error when applying BENY solution

update3 few row of my original excel file

     category     title       price     weight             description
       Shirt    men-shirt    20,25,35    100gm,50gm,150gm   shirt description....   
      pant    men-pent    40,35,90    200gm,350gm,150gm   pant description....   

CodePudding user response:

Update your pandas and explode now can accept two columns

df['price'] = df['price'].str.split(',')
df['weight'] = df['weight'].str.split(',')
out = df.explode(['price','weight'])

CodePudding user response:

If you have a Pandas version prior to 1.3.0, where multi-column explode was added:

Since the lists after splitting the strings have the same number of elements, you can apply Series.explode to the price and weight columns to the the expected output.

import pandas as pd

df = pd.DataFrame({'title': ['Crloni Model145'],
                   'price': ['$45,$50,$60'],
                   'weight': ['200gm,500gm,800gm']})

df['price']=df['price'].str.split(',')
df['weight']=df['weight'].str.split(',')

df = df.set_index(['title']).apply(pd.Series.explode).reset_index()

print(df)

I set the index to title because I don't want explode to be applied to that column, then I reset the index at the end so title becomes a regular column again.

Output:

             title price weight
0  Crloni Model145   $45  200gm
1  Crloni Model145   $50  500gm
2  Crloni Model145   $60  800gm
  • Related