I have Pandas dataset with orders and different types of packaging inside this order separated by comma inside the cell.
order | container | box |
---|---|---|
a | c1,c2,c3 | b1,b2,b3 |
b | c4,c5,c6 | b4,b5,b6 |
Need to get table with two columns: "order" and "content" with all values from both container and box.
I could only merge the container and box - but do not know how to list them row by row.
Needed table is:
order | content |
---|---|
a | c1 |
a | c2 |
a | c3 |
a | b1 |
a | b2 |
a | b3 |
b | c4 |
b | c5 |
b | c6 |
b | b4 |
b | b5 |
b | b6 |
CodePudding user response:
You can stack
, split
explode
and convert to DataFrame:
out = (df.set_index('order').stack() # set other columns aside and stack
.str.split(',').explode() # expand values to multiple rows
# cleanup
.reset_index('order', name='content').reset_index(drop=True)
)
print(out)
Output:
order content
0 a c1
1 a c2
2 a c3
3 a b1
4 a b2
5 a b3
6 b c4
7 b c5
8 b c6
9 b b4
10 b b5
11 b b6
Alternative with melt
:
(df.melt('order', value_name='content')
.assign(content=lambda d: d['content'].str.split(','))
.explode('content').drop(columns='variable')
)
CodePudding user response:
You could also use pd.DataFrame.melt
, pd.DataFrame.set_index
, pd.DataFrame.pipe
, pd.DataFrame.sort_index
:
(df
.melt(id_vars='order', value_vars=['container', 'box'])
.set_index('order')
.pipe(lambda x: x['value'].str.split(',').explode())
.sort_index()
)
order
a c1
a c2
a c3
a b1
a b2
a b3
b c4
b c5
b c6
b b4
b b5
b b6
or even a more concise approach::
(df
.melt(id_vars='order', value_vars=['container', 'box'])
.set_index('order')['value'].str.split(',').explode()
.sort_index())
CodePudding user response:
I would do it following way
import pandas as pd
df = pd.DataFrame({"order":["a","b"],"container":["c1,c2,c3","c4,c5,c6"],"box":["b1,b2,b3","b4,b5,b6"]})
df2 = pd.concat([df.order,df.container.str.split(",",expand=True),df.box.str.split(",",expand=True)],axis=1)
df2 = df2.melt("order")[["order","value"]]
print(df2)
output
order value
0 a c1
1 b c4
2 a c2
3 b c5
4 a c3
5 b c6
6 a b1
7 b b4
8 a b2
9 b b5
10 a b3
11 b b6
Explanation: use .str.split
with expand=True
to get ,
-sheared values into separate columns, then use .concat
to create DataFrame
from them, then use .melt
at order
to get desired result and then select required columns.
CodePudding user response:
Because you have two columns, you need to join columns with ','
then split and explode like the below:
(df['container'] "," df['box']).str.split(',').explode().to_frame().reset_index().rename(columns={'index':'order', 0:'content'})
order content
0 0 c1
1 0 c2
2 0 c3
3 0 b1
4 0 b2
5 0 b3
6 1 c4
7 1 c5
8 1 c6
9 1 b4
10 1 b5
11 1 b6