Home > Software engineering >  Double "melt" in a pandas dataframe from Excel file
Double "melt" in a pandas dataframe from Excel file

Time:12-14

I am reading an excel file in pandas with two levels for the columns. I am using Python 3.7

Example Excel file

    Unnamed: 0  Unnamed: 1  Unnamed: 2  2021-01-01 2021-01-02   2021-01-03  2021-01-04  2021-01-05 
0   ProjectNr   Name        Sector      categorya   categoryb   categoryc   categoryd   categorye
1   1           aaa          A1         14.995      14.995      14.995       14.995     14.995
2   2           aaa          A2         7.4975      7.4975      7.4975       NaN        NaN
3   3           aaa          A3         NaN         11.996      11.996      11.996      NaN

I would like to transform the row "category" and "date" in to different columns of the data frame. I try with melt but I do not know how to do it the second melt or the melt to the combination row headers.

I would like to get something like

ProjectNr   Name    Sector  Category    date         Price
1           aaa     A1      categorya   01/01/2021   € 15,00 
1           aaa     A1      categoryb   02/01/2021   € 15,00 
1           aaa     A1      categoryc   03/01/2021   € 15,00 
1           aaa     A1      categoryd   04/01/2021   € 15,00 
1           aaa     A1      categorye   05/01/2021   € 15,00 
2           aaa     A2      categorya   01/01/2021   € 7,50 
2           aaa     A2      categoryb   02/01/2021   € 7,50 
2           aaa     A2      categoryc   03/01/2021   € 7,50 
2           aaa     A2      categoryd   04/01/2021  
2           aaa     A2      categorye   05/01/2021  
3           aaa     A3      categorya   01/01/2021  
3           aaa     A3      categoryb   02/01/2021   € 12,00 
3           aaa     A3      categoryc   03/01/2021   € 12,00 
3           aaa     A3      categoryd   04/01/2021   € 12,00 
3           aaa     A3      categorye   05/01/2021  

If I create the df with header=[0, 1] melt broke with the name of the columns. Without header, melt only work for one column. Example:

    Unnamed: 0  Unnamed: 1  Unnamed: 2  dt          value
0   ProjectNr   Name        Sector      2021-01-01  categorya
1   1           aaa         A1          2021-01-01  14.995
2   2           aaa         A2          2021-01-01  7.4975
3   3           aaa         A3          2021-01-01  NaN
4   ProjectNr   Name        Sector      2021-01-02  categoryb
5   1           aaa         A1          2021-01-02  14.995
6   2           aaa         A2          2021-01-02  7.4975

How can I make a melt for the two levels of the headers?

CodePudding user response:

First of all, we need to read the excel file properly

df = pd.read_excel('~/test.xlsx', header=[0, 1], index_col=[0, 1, 2])

Stack using the MultiIndex level that you need keeping the NaNs and then reset the index

df = df.stack(level=[1, 0], dropna=False).reset_index()

Finally, rename the columns

df.columns = ['ProjectNr', 'Name', 'Sector', 'Category', 'date', 'Price']
  • Related