Home > other >  Pandas transpose multiple columns for every row in dataframe with 2 header rows
Pandas transpose multiple columns for every row in dataframe with 2 header rows

Time:02-03

I have a csv (see image below) with the first 2 rows as the column headers and I need the first header row to be the header and second header row to be transposed for every row.

enter image description here

I tried transposing and using the pivot functions but those didn't work. I also tried the pivot_table but that wasn't working either.

The expected output is:

enter image description here

CodePudding user response:

You have to mix melt to flat your dataframe then pivot_table to reshape it according your expected output:

>>> (df.melt(var_name=['Variable', 'Date'], value_name='Value', ignore_index=False)
       .pivot_table(index=df.index.names ['Date'], columns='Variable', values='Value')
       .reset_index().rename_axis(columns=None))

  Brand Family Brand Channel Product        Date  Dollars  Quantity
0            A     A     Reg      A1  2022-01-12        4         1
1            A     A     Reg      A1  2022-01-13        5         2
2            A     A     Reg      A1  2022-01-14        6         3
3            A     A     Reg      A2  2022-01-12       14        11
4            A     A     Reg      A2  2022-01-13       15        12
5            A     A     Reg      A2  2022-01-14       16        13

Input:

>>> df

                                     Quantity                          Dollars                      
                                   2022-01-12 2022-01-13 2022-01-14 2022-01-12 2022-01-13 2022-01-14
Brand Family Brand Channel Product                                                                  
A            A     Reg     A1               1          2          3          4          5          6
                           A2              11         12         13         14         15         16

CodePudding user response:

I just assumed your CSV file is like the one I have inline in my code.

Solution

First, read the file and ignore the headers.

import io
import pandas as pd
import numpy as np

csv_file = io.StringIO('''\
,,,,Quantity,Quantity,Quantity,Dollars,Dollars,Dollars
Brand Family,Brand,Channel,Product,2022-01-12,2022-01-13,2022-01-14,2022-01-12,2022-01-13,2022-01-14
Brand A,Brand A,Reg,A1,18,47,41,216,164,492
Brand A,Brand A,Reg,A2,9,23,20,108,276,240
Brand A,Brand A,Reg,A3,28,80,82,392,1120,1148
Brand A,Brand A,Reg,A4,,,,,,
Brand A,Brand A,Reg,A5,7,15,13,98,210,182\
''')

df = pd.read_csv(csv_file, header=None).replace(np.nan, 0)
print(df)
              0        1        2        3           4           5           6           7           8           9
0             0        0        0        0    Quantity    Quantity    Quantity     Dollars     Dollars     Dollars
1  Brand Family    Brand  Channel  Product  2022-01-12  2022-01-13  2022-01-14  2022-01-12  2022-01-13  2022-01-14
2       Brand A  Brand A      Reg       A1          18          47          41         216         164         492
3       Brand A  Brand A      Reg       A2           9          23          20         108         276         240
4       Brand A  Brand A      Reg       A3          28          80          82         392        1120        1148
5       Brand A  Brand A      Reg       A4           0           0           0           0           0           0
6       Brand A  Brand A      Reg       A5           7          15          13          98         210         182

And let's divide it into two pieces, the left part

left = df.iloc[2:, :4].set_axis(df.iloc[1, :4], axis='columns')
print(left)
1 Brand Family    Brand Channel Product
2      Brand A  Brand A     Reg      A1
3      Brand A  Brand A     Reg      A2
4      Brand A  Brand A     Reg      A3
5      Brand A  Brand A     Reg      A4
6      Brand A  Brand A     Reg      A5

and the right part.

right = df.iloc[:, 4:].T
print(right)
          0           1    2    3     4  5    6
4  Quantity  2022-01-12   18    9    28  0    7
5  Quantity  2022-01-13   47   23    80  0   15
6  Quantity  2022-01-14   41   20    82  0   13
7   Dollars  2022-01-12  216  108   392  0   98
8   Dollars  2022-01-13  164  276  1120  0  210
9   Dollars  2022-01-14  492  240  1148  0  182

Now, pivot the right part

pivoted = right.pivot(index=0, columns=[1]).T.reset_index(level=1, names=[0, 'Date'])
print(pivoted)

and note the index is consistent with the left part's.

0        Date Dollars Quantity
2  2022-01-12     216       18
2  2022-01-13     164       47
2  2022-01-14     492       41
3  2022-01-12     108        9
3  2022-01-13     276       23
3  2022-01-14     240       20
4  2022-01-12     392       28
4  2022-01-13    1120       80
4  2022-01-14    1148       82
5  2022-01-12       0        0
5  2022-01-13       0        0
5  2022-01-14       0        0
6  2022-01-12      98        7
6  2022-01-13     210       15
6  2022-01-14     182       13

Finally, join them by the common index.

df = left.join(pivoted).reset_index(drop=True)
print(df)
   Brand Family    Brand Channel Product        Date Dollars Quantity
0       Brand A  Brand A     Reg      A1  2022-01-12     216       18
1       Brand A  Brand A     Reg      A1  2022-01-13     164       47
2       Brand A  Brand A     Reg      A1  2022-01-14     492       41
3       Brand A  Brand A     Reg      A2  2022-01-12     108        9
4       Brand A  Brand A     Reg      A2  2022-01-13     276       23
5       Brand A  Brand A     Reg      A2  2022-01-14     240       20
6       Brand A  Brand A     Reg      A3  2022-01-12     392       28
7       Brand A  Brand A     Reg      A3  2022-01-13    1120       80
8       Brand A  Brand A     Reg      A3  2022-01-14    1148       82
9       Brand A  Brand A     Reg      A4  2022-01-12       0        0
10      Brand A  Brand A     Reg      A4  2022-01-13       0        0
11      Brand A  Brand A     Reg      A4  2022-01-14       0        0
12      Brand A  Brand A     Reg      A5  2022-01-12      98        7
13      Brand A  Brand A     Reg      A5  2022-01-13     210       15
14      Brand A  Brand A     Reg      A5  2022-01-14     182       13

CodePudding user response:

You could also use:

df1 = pd.read_csv('path/file.csv', header=[0, 1], index_col=[0,1,2,3])

df1.stack().reset_index()

  Brand Family Brand Channel Product        Date  Quantity  Dollars
0            A     A     Reg      A1  2022-01-12         1        4
1            A     A     Reg      A1  2022-01-13         2        5
2            A     A     Reg      A1  2022-01-14         3        6
3            A     A     Reg      A2  2022-01-12        11       14
4            A     A     Reg      A2  2022-01-13        12       15
5            A     A     Reg      A2  2022-01-14        13       16

where df1 is your data

  • Related