Home > database >  Join the columns and list them one by one in row
Join the columns and list them one by one in row

Time:02-06

I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.

order container box
a c1,c2,c3 b1,b2,b3
b c4,c5,c6 b4,b5,b6

Need to get table with two columns: "order" and "content" with all values from both container and box.

I could only merge the container and box - but do not know how to list them row by row.

Needed table is:

order content
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6

CodePudding user response:

You can stack, split explode and convert to DataFrame:

out = (df.set_index('order').stack() # set other columns aside and stack
         .str.split(',').explode() # expand values to multiple rows
          # cleanup
         .reset_index('order', name='content').reset_index(drop=True)
      )
print(out)

Output:

   order content
0      a      c1
1      a      c2
2      a      c3
3      a      b1
4      a      b2
5      a      b3
6      b      c4
7      b      c5
8      b      c6
9      b      b4
10     b      b5
11     b      b6

Alternative with melt:

(df.melt('order', value_name='content')
   .assign(content=lambda d: d['content'].str.split(','))
   .explode('content').drop(columns='variable')
)

CodePudding user response:

You could also use pd.DataFrame.melt, pd.DataFrame.set_index, pd.DataFrame.pipe, pd.DataFrame.sort_index:

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')
 .pipe(lambda x: x['value'].str.split(',').explode())
 .sort_index()
)

order
a    c1
a    c2
a    c3
a    b1
a    b2
a    b3
b    c4
b    c5
b    c6
b    b4
b    b5
b    b6

or even a more concise approach::

(df
 .melt(id_vars='order', value_vars=['container', 'box'])
 .set_index('order')['value'].str.split(',').explode()
 .sort_index())

CodePudding user response:

I would do it following way

import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)

output

   order value
0      a    c1
1      b    c4
2      a    c2
3      b    c5
4      a    c3
5      b    c6
6      a    b1
7      b    b4
8      a    b2
9      b    b5
10     a    b3
11     b    b6

Explanation: use .str.split with expand=True to get ,-sheared values into separate columns, then use .concat to create DataFrame from them, then use .melt at order to get desired result and then select required columns.

CodePudding user response:

Because you have two columns, you need to join columns with ',' then split and explode like the below:

(df['container']   ","   df['box']).str.split(',').explode().to_frame().reset_index().rename(columns={'index':'order', 0:'content'})

    order content
0       0      c1
1       0      c2
2       0      c3
3       0      b1
4       0      b2
5       0      b3
6       1      c4
7       1      c5
8       1      c6
9       1      b4
10      1      b5
11      1      b6
  • Related