Home > Back-end >  Unpivot multiple variables in a python pandas dataframe
Unpivot multiple variables in a python pandas dataframe

Time:08-10

I've got a big dataframe which shows the amount of each product and their costs for different products. However, I want to transform (unpivot) the dataframe into a long dataframe with each product name as an ID and their amounts and costs in two different columns. I've tried the ‍pd.melt‍‍‍ and ireshape functions, but neither seems to work.

Here is an example of what I am trying to do. Here is my table

df = pd.DataFrame({ 'Year': [year1, year2,year3],                                                                       
               'A': [200,300,400],
               'B': [500,600,300],
               'C': [450,369,235],
               'A cost': [7000, 4000, 7000 ],
               'B cost': [9000, 4000, 6000],
               'C cost': [1100, 4300, 2320],
               })
print(df)

current data frame:

enter image description here

Desired data frame:

enter image description here

CodePudding user response:

Create MultiIndex by splitting columnsnames and setting Year to index, then replace missing values with amount and reshape by DataFrame.stack:

df = pd.DataFrame({ 'Year': 'year1,year2,year3'.split(','),
               'A': [200,300,400],
               'B': [500,600,300],
               'C': [450,369,235],
               'A cost': [7000, 4000, 7000 ],
               'B cost': [9000, 4000, 6000],
               'C cost': [1100, 4300, 2320],
               })
print(df)
    
df1 = df.set_index('Year')
df1.columns = df1.columns.str.split(expand=True)

f = lambda x: 'amount' if pd.isna(x) else x
df1 = df1.rename(columns=f).stack(0).rename_axis(['Year','product']).reset_index()
print (df1)
    Year product  amount  cost
0  year1       A     200  7000
1  year1       B     500  9000
2  year1       C     450  1100
3  year2       A     300  4000
4  year2       B     600  4000
5  year2       C     369  4300
6  year3       A     400  7000
7  year3       B     300  6000
8  year3       C     235  2320

CodePudding user response:

This can be achieved quite simply with janitor's pivot_longer:

# pip install janitor
import janitor

out = (df
 .pivot_longer(index='Year', names_to=('product', '.value'),
               names_pattern=r'(\S )\s*(\S*)')
 .rename(columns={'': 'amount'})
)

output:

    Year product  amount  cost
0  year1       A     200  7000
1  year2       A     300  4000
2  year3       A     400  7000
3  year1       B     500  9000
4  year2       B     600  4000
5  year3       B     300  6000
6  year1       C     450  1100
7  year2       C     369  4300
8  year3       C     235  2320
  • Related