I have a Dataframe that has a multi level column index. Please see the code below to construct the DataFrame. I have replaced all the data with random numbers to prevent sensitive information being leaked.
import pandas as pd
import numpy as np
import random
pd.options.display.max_columns=None
multi_index = [('Part Information', 'Brand'),
('Part Information', 'Model'),
('Part Information', 'Part Grouping'),
('Part Information', 'Part Desc'),
('Part Information', 'VWG Part Number(s)'),
('Part Information', 'VWG Retail Price'),
('Part Information', 'Trade Discount'),
('Part Information', 'VWG Trade Price'),
('Part Information', 'Is VWG Part Common or Unique?'),
('Part Information', 'VWG Volume'),
('Part Information', 'Competitor'),
('Q1 2018', 'Competitor Part Number'),
('Q1 2018', 'Competitor Brand'),
('Q1 2018', 'Competitor Retail Price'),
('Q1 2018', 'Competitor Trade Price'),
('Q2 2018', 'Competitor Part Number'),
('Q2 2018', 'Competitor Brand'),
('Q2 2018', 'Competitor Retail Price'),
('Q2 2018', 'Competitor Trade Price'),
('Q3 2018', 'Competitor Part Number'),
('Q3 2018', 'Competitor Brand'),
('Q3 2018', 'Competitor Retail Price'),
('Q3 2018', 'Competitor Trade Price'),
('Q4 2018', 'Competitor Part Number'),
('Q4 2018', 'Competitor Brand'),
('Q4 2018', 'Competitor Retail Price'),
('Q4 2018', 'Competitor Trade Price'),
('Q2 2019', 'Competitor Part Number'),
('Q2 2019', 'Competitor Brand'),
('Q2 2019', 'Competitor Retail Price'),
('Q2 2019', 'Competitor Trade Price'),
('Q3 2019', 'Competitor Part Number'),
('Q3 2019', 'Competitor Brand'),
('Q3 2019', 'Competitor Retail Price'),
('Q3 2019', 'Competitor Trade Price'),
('Q4 2019', 'Competitor Part Number'),
('Q4 2019', 'Competitor Brand'),
('Q4 2019', 'Competitor Retail Price'),
('Q4 2019', 'Competitor Trade Price')]
pd.MultiIndex.from_tuples(multi_index)
df = pd.DataFrame(np.random.randn(10, 39))
df.columns = pd.MultiIndex.from_tuples(multi_index)
The output dataframe is shown in the screenshots below (apologies for the multiple screenshots).
As you can see there are are repeated columns that I would like to unpivot. Furthermore, I would like to add in a 'QUARTER'
column instead of having a multi index column structure. So I would like the dataframe to be manipulated as follows:
cols = ['Brand', 'Model', 'Part Grouping', 'Part Desc', 'VWG Part Number(s)',
'VWG Retail Price', 'Trade Discount', 'VWG Trade Price',
'Is VWG Part Common or Unique?', 'VWG Volume', 'Competitor', 'QUARTER',
'Competitor Part Number', 'Competitor Brand', 'Competitor Retail Price',
'Competitor Trade Price']
df_new = pd.DataFrame(np.random.randn(10, 16), columns=cols)
df_new.loc[0:4, 'QUARTER'] = 'Q1 2018'
df_new.loc[4:8, 'QUARTER'] = 'Q2 2018'
df_new.loc[9, 'QUARTER'] = '...'
How to I unpivot the column categories in Level 0 of the column hierarchy??? Do I use pd.melt()
or pd.stack()/unstack()
I would greatly appreciate any help and if you need more information let me know.
CodePudding user response:
IIUC, you can split the DataFrame in two, unstack the right part and join:
(df['Part Information']
.join(df.drop(columns='Part Information', level=0)
.stack(0)
.rename_axis((None, 'Quarter'))
.reset_index(1))
)
output:
Brand Model Part Grouping Part Desc VWG Part Number(s) VWG Retail Price Trade Discount VWG Trade Price Is VWG Part Common or Unique? VWG Volume Competitor Quarter Competitor Brand Competitor Part Number Competitor Retail Price Competitor Trade Price
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q1 2018 0.668761 -0.266060 -1.018759 -0.755990
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q2 2018 1.386664 -1.832704 1.325866 -0.123179
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q2 2019 -0.612474 -0.250223 -1.299746 -0.870354
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q3 2018 -1.553103 1.462980 1.578326 0.417629
...