Home > front end >  How can I move certain columns into rows?
How can I move certain columns into rows?

Time:06-30

Suppose I have the df below. I would like to combine the price columns and value columns so that all prices are in one column and all volumes are in another column. I would also like a third column that identified the price level. For example, unit1, unit2 and unit3.

import numpy as np
import pandas as pd
df = pd.DataFrame(
    {
        'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
        'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
        'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
        'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
        'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
        'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
        'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
        'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
    }
)
df

df1

This is a snippet of what the final df should look like:

df_final


I tried using melt and I think almost have the right answer.

pd.melt(df, id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit1_vol', 'unit2_price', 'unit2_vol', 'unit3_price', 'unit3_vol'])

This is what the partial df looks like with melt:

df2

The problem with the above is that volume and price are in the same column but I want them to be in 2 separate columns.

Did I use the right function?

CodePudding user response:

Try with melt , then pivot after split

s = df.melt(['uid','location'])
s[['unit','type']] = 

s['variable'].str.split('_',expand=True)
s = s.pivot(index = ['uid','location','unit'],columns = ['type'],values = 'value').reset_index()
s
Out[967]: 
type   uid location   unit  price  vol
0     A100     Asia  unit1     10  150
1     A100     Asia  unit2     10  200
2     A100     Asia  unit3     20  500
3     A200     Asia  unit1     20  100
4     A200     Asia  unit2     10  150

CodePudding user response:

Probably, you need to split the df into 3 i.e. unit1,unit2,unit3. Apply the melt function and merge back according to their uid and location.

CodePudding user response:

maybe this:

import pandas as pd
df = pd.DataFrame(
    {
        'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
        'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
        'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
        'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
        'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
        'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
        'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
        'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
    }
)

price = pd.melt(
    df, id_vars=['uid', 'location', 'unit2_vol', 'unit1_vol', 'unit3_vol'], value_vars=['unit1_price', 'unit3_price', 'unit2_price'], var_name="price", value_name="price_value"
)

res = pd.melt(
    price, id_vars=['uid', 'location', 'price', 'price_value'], value_vars=['unit2_vol', 'unit1_vol', 'unit3_vol'], var_name="vol", value_name="vol_value"
)
print(res)

OUTPUT:

     uid location        price  price_value        vol  vol_value
0   U100       US  unit1_price           10  unit2_vol        200
1   U200       US  unit1_price           20  unit2_vol        200
2   E100       EU  unit1_price           15  unit2_vol        150
3   E200       EU  unit1_price           10  unit2_vol        300
4   E300       EU  unit1_price           10  unit2_vol        300
..   ...      ...          ...          ...        ...        ...
85  A100     Asia  unit2_price           10  unit3_vol        500
86  A200     Asia  unit2_price           10  unit3_vol        500
87  A300     Asia  unit2_price           10  unit3_vol        500
88  A400     Asia  unit2_price           20  unit3_vol        500
89  A500     Asia  unit2_price           20  unit3_vol        500

CodePudding user response:

You can form two dataframe using pd.melt first and combine it back to become one dataframe.

df1 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit2_price', 'unit3_price'],var_name='unit',value_name='price')

df2 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_vol', 'unit2_vol', 'unit3_vol'],var_name='unit', value_name="volume")

ddf = pd.concat([df1,df2['volume']],axis=1).sort_values(by=['uid','unit'],ignore_index=True)

ddf['unit']=ddf['unit'].str.split('_',expand=True)[0]

CodePudding user response:

You can do:

df_price = df.set_index(['uid','location']).filter(
    regex='price$').stack().rename_axis(
    ['uid', 'location', 'price_unit']).rename('price').reset_index()

df_vol = df.filter(regex='vol$').stack().rename_axis(
    ['', 'vol_unit']).rename('volume').reset_index(level=1).reset_index(drop=True)

df2 = pd.concat([df_price, df_vol], axis=1)
df2['unit'] = df2['price_unit'].apply(lambda x:x.split('_')[0])
df2.drop(['price_unit', 'vol_unit'],axis=1, inplace=True)

print(df2):

     uid location  price  volume   unit
0   U100       US     10     100  unit1
1   U100       US     10     200  unit2
2   U100       US      0       0  unit3
3   U200       US     20     150  unit1
4   U200       US     25     200  unit2
5   U200       US      0       0  unit3
6   E100       EU     15     100  unit1
7   E100       EU     30     150  unit2
8   E100       EU      0       0  unit3
9   E200       EU     10     200  unit1
10  E200       EU     20     300  unit2
11  E200       EU     20     500  unit3
12  E300       EU     10     150  unit1
13  E300       EU     10     300  unit2
14  E300       EU     20     500  unit3
15  A100     Asia     10     150  unit1
16  A100     Asia     10     200  unit2
17  A100     Asia     20     500  unit3
18  A200     Asia     20     100  unit1
19  A200     Asia     10     150  unit2
20  A200     Asia     20     500  unit3
21  A300     Asia     20     200  unit1
22  A300     Asia     10     225  unit2
23  A300     Asia     20     500  unit3
24  A400     Asia     25     200  unit1
25  A400     Asia     20     225  unit2
26  A400     Asia     20     500  unit3
27  A500     Asia     25     200  unit1
28  A500     Asia     20     250  unit2
29  A500     Asia     20     500  unit3

CodePudding user response:

you can do the transformation efficiently in one step from wide to long, using pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
     index = ['uid', 'location'], 
     names_to = ('unit', '.value'), 
     names_sep = '_')
)
     uid location   unit  price  vol
0   U100       US  unit1     10  100
1   U200       US  unit1     20  150
2   E100       EU  unit1     15  100
3   E200       EU  unit1     10  200
4   E300       EU  unit1     10  150
5   A100     Asia  unit1     10  150
6   A200     Asia  unit1     20  100
7   A300     Asia  unit1     20  200
8   A400     Asia  unit1     25  200
9   A500     Asia  unit1     25  200
10  U100       US  unit2     10  200
11  U200       US  unit2     25  200
12  E100       EU  unit2     30  150
13  E200       EU  unit2     20  300
14  E300       EU  unit2     10  300
15  A100     Asia  unit2     10  200
16  A200     Asia  unit2     10  150
17  A300     Asia  unit2     10  225
18  A400     Asia  unit2     20  225
19  A500     Asia  unit2     20  250
20  U100       US  unit3      0    0
21  U200       US  unit3      0    0
22  E100       EU  unit3      0    0
23  E200       EU  unit3     20  500
24  E300       EU  unit3     20  500
25  A100     Asia  unit3     20  500
26  A200     Asia  unit3     20  500
27  A300     Asia  unit3     20  500
28  A400     Asia  unit3     20  500
29  A500     Asia  unit3     20  500

.value determines which part of the column remains as a header ( in this case it is price and vol), while the part not associated with .value goes into the unit column. names_sep helps split the labels.

Another option is with pd.wide_to_long, which also allows the reshaping in a single step. It does require some munging on the columns:

i = ['uid', 'location']

temp = df.set_index(i)

# reshape the columns, moving price and vol to the front
temp.columns = temp.columns.str.split('_').str[::-1].str.join('_')

(pd
.wide_to_long(
    temp.reset_index(), 
    i = i, 
    j = 'unit', 
    stubnames = ['price', 'vol'], 
    sep='_', 
    suffix = '. ')
.reset_index()
)
     uid location   unit  price  vol
0   U100       US  unit1     10  100
1   U100       US  unit2     10  200
2   U100       US  unit3      0    0
3   U200       US  unit1     20  150
4   U200       US  unit2     25  200
5   U200       US  unit3      0    0
6   E100       EU  unit1     15  100
7   E100       EU  unit2     30  150
8   E100       EU  unit3      0    0
9   E200       EU  unit1     10  200
10  E200       EU  unit2     20  300
11  E200       EU  unit3     20  500
12  E300       EU  unit1     10  150
13  E300       EU  unit2     10  300
14  E300       EU  unit3     20  500
15  A100     Asia  unit1     10  150
16  A100     Asia  unit2     10  200
17  A100     Asia  unit3     20  500
18  A200     Asia  unit1     20  100
19  A200     Asia  unit2     10  150
20  A200     Asia  unit3     20  500
21  A300     Asia  unit1     20  200
22  A300     Asia  unit2     10  225
23  A300     Asia  unit3     20  500
24  A400     Asia  unit1     25  200
25  A400     Asia  unit2     20  225
26  A400     Asia  unit3     20  500
27  A500     Asia  unit1     25  200
28  A500     Asia  unit2     20  250
29  A500     Asia  unit3     20  500

You can even use the stack option, again all these options are just for you to do the transformation efficiently:

i = ['uid', 'location']

temp = df.set_index(i)

# create a MultiIndex
temp.columns = temp.columns.str.split('_', expand = True)
temp.columns.names = ['unit', None]

temp.stack('unit').reset_index()

     uid location   unit  price  vol
0   U100       US  unit1     10  100
1   U100       US  unit2     10  200
2   U100       US  unit3      0    0
3   U200       US  unit1     20  150
4   U200       US  unit2     25  200
5   U200       US  unit3      0    0
6   E100       EU  unit1     15  100
7   E100       EU  unit2     30  150
8   E100       EU  unit3      0    0
9   E200       EU  unit1     10  200
10  E200       EU  unit2     20  300
11  E200       EU  unit3     20  500
12  E300       EU  unit1     10  150
13  E300       EU  unit2     10  300
14  E300       EU  unit3     20  500
15  A100     Asia  unit1     10  150
16  A100     Asia  unit2     10  200
17  A100     Asia  unit3     20  500
18  A200     Asia  unit1     20  100
19  A200     Asia  unit2     10  150
20  A200     Asia  unit3     20  500
21  A300     Asia  unit1     20  200
22  A300     Asia  unit2     10  225
23  A300     Asia  unit3     20  500
24  A400     Asia  unit1     25  200
25  A400     Asia  unit2     20  225
26  A400     Asia  unit3     20  500
27  A500     Asia  unit1     25  200
28  A500     Asia  unit2     20  250
29  A500     Asia  unit3     20  500
  • Related