Suppose I have the df
below. I would like to combine the price columns and value columns so that all prices are in one column and all volumes are in another column. I would also like a third column that identified the price level. For example, unit1
, unit2
and unit3
.
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
}
)
df
This is a snippet of what the final df
should look like:
I tried using melt
and I think almost have the right answer.
pd.melt(df, id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit1_vol', 'unit2_price', 'unit2_vol', 'unit3_price', 'unit3_vol'])
This is what the partial df looks like with melt
:
The problem with the above is that volume and price are in the same column but I want them to be in 2 separate columns.
Did I use the right function?
CodePudding user response:
Try with melt
, then pivot
after split
s = df.melt(['uid','location'])
s[['unit','type']] =
s['variable'].str.split('_',expand=True)
s = s.pivot(index = ['uid','location','unit'],columns = ['type'],values = 'value').reset_index()
s
Out[967]:
type uid location unit price vol
0 A100 Asia unit1 10 150
1 A100 Asia unit2 10 200
2 A100 Asia unit3 20 500
3 A200 Asia unit1 20 100
4 A200 Asia unit2 10 150
CodePudding user response:
Probably, you need to split the df into 3 i.e. unit1,unit2,unit3. Apply the melt function and merge back according to their uid and location.
CodePudding user response:
maybe this:
import pandas as pd
df = pd.DataFrame(
{
'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
}
)
price = pd.melt(
df, id_vars=['uid', 'location', 'unit2_vol', 'unit1_vol', 'unit3_vol'], value_vars=['unit1_price', 'unit3_price', 'unit2_price'], var_name="price", value_name="price_value"
)
res = pd.melt(
price, id_vars=['uid', 'location', 'price', 'price_value'], value_vars=['unit2_vol', 'unit1_vol', 'unit3_vol'], var_name="vol", value_name="vol_value"
)
print(res)
OUTPUT:
uid location price price_value vol vol_value
0 U100 US unit1_price 10 unit2_vol 200
1 U200 US unit1_price 20 unit2_vol 200
2 E100 EU unit1_price 15 unit2_vol 150
3 E200 EU unit1_price 10 unit2_vol 300
4 E300 EU unit1_price 10 unit2_vol 300
.. ... ... ... ... ... ...
85 A100 Asia unit2_price 10 unit3_vol 500
86 A200 Asia unit2_price 10 unit3_vol 500
87 A300 Asia unit2_price 10 unit3_vol 500
88 A400 Asia unit2_price 20 unit3_vol 500
89 A500 Asia unit2_price 20 unit3_vol 500
CodePudding user response:
You can form two dataframe using pd.melt
first and combine it back to become one dataframe.
df1 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit2_price', 'unit3_price'],var_name='unit',value_name='price')
df2 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_vol', 'unit2_vol', 'unit3_vol'],var_name='unit', value_name="volume")
ddf = pd.concat([df1,df2['volume']],axis=1).sort_values(by=['uid','unit'],ignore_index=True)
ddf['unit']=ddf['unit'].str.split('_',expand=True)[0]
CodePudding user response:
You can do:
df_price = df.set_index(['uid','location']).filter(
regex='price$').stack().rename_axis(
['uid', 'location', 'price_unit']).rename('price').reset_index()
df_vol = df.filter(regex='vol$').stack().rename_axis(
['', 'vol_unit']).rename('volume').reset_index(level=1).reset_index(drop=True)
df2 = pd.concat([df_price, df_vol], axis=1)
df2['unit'] = df2['price_unit'].apply(lambda x:x.split('_')[0])
df2.drop(['price_unit', 'vol_unit'],axis=1, inplace=True)
print(df2):
uid location price volume unit
0 U100 US 10 100 unit1
1 U100 US 10 200 unit2
2 U100 US 0 0 unit3
3 U200 US 20 150 unit1
4 U200 US 25 200 unit2
5 U200 US 0 0 unit3
6 E100 EU 15 100 unit1
7 E100 EU 30 150 unit2
8 E100 EU 0 0 unit3
9 E200 EU 10 200 unit1
10 E200 EU 20 300 unit2
11 E200 EU 20 500 unit3
12 E300 EU 10 150 unit1
13 E300 EU 10 300 unit2
14 E300 EU 20 500 unit3
15 A100 Asia 10 150 unit1
16 A100 Asia 10 200 unit2
17 A100 Asia 20 500 unit3
18 A200 Asia 20 100 unit1
19 A200 Asia 10 150 unit2
20 A200 Asia 20 500 unit3
21 A300 Asia 20 200 unit1
22 A300 Asia 10 225 unit2
23 A300 Asia 20 500 unit3
24 A400 Asia 25 200 unit1
25 A400 Asia 20 225 unit2
26 A400 Asia 20 500 unit3
27 A500 Asia 25 200 unit1
28 A500 Asia 20 250 unit2
29 A500 Asia 20 500 unit3
CodePudding user response:
you can do the transformation efficiently in one step from wide to long, using pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = ['uid', 'location'],
names_to = ('unit', '.value'),
names_sep = '_')
)
uid location unit price vol
0 U100 US unit1 10 100
1 U200 US unit1 20 150
2 E100 EU unit1 15 100
3 E200 EU unit1 10 200
4 E300 EU unit1 10 150
5 A100 Asia unit1 10 150
6 A200 Asia unit1 20 100
7 A300 Asia unit1 20 200
8 A400 Asia unit1 25 200
9 A500 Asia unit1 25 200
10 U100 US unit2 10 200
11 U200 US unit2 25 200
12 E100 EU unit2 30 150
13 E200 EU unit2 20 300
14 E300 EU unit2 10 300
15 A100 Asia unit2 10 200
16 A200 Asia unit2 10 150
17 A300 Asia unit2 10 225
18 A400 Asia unit2 20 225
19 A500 Asia unit2 20 250
20 U100 US unit3 0 0
21 U200 US unit3 0 0
22 E100 EU unit3 0 0
23 E200 EU unit3 20 500
24 E300 EU unit3 20 500
25 A100 Asia unit3 20 500
26 A200 Asia unit3 20 500
27 A300 Asia unit3 20 500
28 A400 Asia unit3 20 500
29 A500 Asia unit3 20 500
.value
determines which part of the column remains as a header ( in this case it is price and vol), while the part not associated with .value
goes into the unit
column. names_sep
helps split the labels.
Another option is with pd.wide_to_long
, which also allows the reshaping in a single step. It does require some munging on the columns:
i = ['uid', 'location']
temp = df.set_index(i)
# reshape the columns, moving price and vol to the front
temp.columns = temp.columns.str.split('_').str[::-1].str.join('_')
(pd
.wide_to_long(
temp.reset_index(),
i = i,
j = 'unit',
stubnames = ['price', 'vol'],
sep='_',
suffix = '. ')
.reset_index()
)
uid location unit price vol
0 U100 US unit1 10 100
1 U100 US unit2 10 200
2 U100 US unit3 0 0
3 U200 US unit1 20 150
4 U200 US unit2 25 200
5 U200 US unit3 0 0
6 E100 EU unit1 15 100
7 E100 EU unit2 30 150
8 E100 EU unit3 0 0
9 E200 EU unit1 10 200
10 E200 EU unit2 20 300
11 E200 EU unit3 20 500
12 E300 EU unit1 10 150
13 E300 EU unit2 10 300
14 E300 EU unit3 20 500
15 A100 Asia unit1 10 150
16 A100 Asia unit2 10 200
17 A100 Asia unit3 20 500
18 A200 Asia unit1 20 100
19 A200 Asia unit2 10 150
20 A200 Asia unit3 20 500
21 A300 Asia unit1 20 200
22 A300 Asia unit2 10 225
23 A300 Asia unit3 20 500
24 A400 Asia unit1 25 200
25 A400 Asia unit2 20 225
26 A400 Asia unit3 20 500
27 A500 Asia unit1 25 200
28 A500 Asia unit2 20 250
29 A500 Asia unit3 20 500
You can even use the stack option, again all these options are just for you to do the transformation efficiently:
i = ['uid', 'location']
temp = df.set_index(i)
# create a MultiIndex
temp.columns = temp.columns.str.split('_', expand = True)
temp.columns.names = ['unit', None]
temp.stack('unit').reset_index()
uid location unit price vol
0 U100 US unit1 10 100
1 U100 US unit2 10 200
2 U100 US unit3 0 0
3 U200 US unit1 20 150
4 U200 US unit2 25 200
5 U200 US unit3 0 0
6 E100 EU unit1 15 100
7 E100 EU unit2 30 150
8 E100 EU unit3 0 0
9 E200 EU unit1 10 200
10 E200 EU unit2 20 300
11 E200 EU unit3 20 500
12 E300 EU unit1 10 150
13 E300 EU unit2 10 300
14 E300 EU unit3 20 500
15 A100 Asia unit1 10 150
16 A100 Asia unit2 10 200
17 A100 Asia unit3 20 500
18 A200 Asia unit1 20 100
19 A200 Asia unit2 10 150
20 A200 Asia unit3 20 500
21 A300 Asia unit1 20 200
22 A300 Asia unit2 10 225
23 A300 Asia unit3 20 500
24 A400 Asia unit1 25 200
25 A400 Asia unit2 20 225
26 A400 Asia unit3 20 500
27 A500 Asia unit1 25 200
28 A500 Asia unit2 20 250
29 A500 Asia unit3 20 500