Home > database >  How to melt 2 level of columns in pandas without providing the name of columns?
How to melt 2 level of columns in pandas without providing the name of columns?

Time:10-19

I would like a support on doing melt without providing all the column names. I have a dataframe with 5000 columns and it is quite dynamic. Hence, would like to see if there is any way to do melt without providing columns (using df.columns).

# Sample df
#|               sub1             sub2              subN
#|   Student     max  min  mean   max   min mean   max   min  mean
#| 0     Joy     7    2    0      8    3    1      7    2    0
#| 1     Red     9    2    5      8    3    4      7    1    0
#| 2     CTL     5    0    4      4    1    7      7    2    5

I would like to have

# Sample df
#|            
#|   Student Sub  type
#| 0     Joy sub1 max   7
#| 0     Joy sub1 min   2
#| 0     Joy sub1 mean  0
#| 0     Joy sub2 max   8
#| 0     Joy sub2 min   3
#| 0     Joy sub2 mean  1
#| 0     Joy subN max   7
#| 0     Joy subN min   2
#| 0     Joy subN mean  0
#| 1     Red ........

CodePudding user response:

Nevermind. Found out just using melt() without any arguments did the trick

CodePudding user response:

It depends of data:

print (df)
          sub1          sub2          subN         
  Student  max min mean  max min mean  max min mean
0     Joy    7   2    0    8   3    1    7   2    0
1     Red    9   2    5    8   3    4    7   1    0
2     CTL    5   0    4    4   1    7    7   2    5

Here is not clear how looks first column, so test it:

print (df.columns[:5])
MultiIndex([(    '', 'Student'),
            ('sub1',     'max'),
            ('sub1',     'min'),
            ('sub1',    'mean'),
            ('sub2',     'max')],
           )

Incorrect ouput:

df2 = df.melt()
print (df2)
   variable_0 variable_1 value
0                Student   Joy
1                Student   Red
2                Student   CTL
3        sub1        max     7
4        sub1        max     9
5        sub1        max     5
6        sub1        min     2
7        sub1        min     2
8        sub1        min     0
9        sub1       mean     0
10       sub1       mean     5
11       sub1       mean     4
12       sub2        max     8
13       sub2        max     8
14       sub2        max     4
15       sub2        min     3
16       sub2        min     3
17       sub2        min     1
18       sub2       mean     1
19       sub2       mean     4
20       sub2       mean     7
21       subN        max     7
22       subN        max     7
23       subN        max     7
24       subN        min     2
25       subN        min     1
26       subN        min     2
27       subN       mean     0
28       subN       mean     0
29       subN       mean     5

Because Student is part of MultiIndex:

df1 = df.melt(id_vars = list(df.columns[:1]))
print (df1)
   (, Student) variable_0 variable_1  value
0          Joy       sub1        max      7
1          Red       sub1        max      9
2          CTL       sub1        max      5
3          Joy       sub1        min      2
4          Red       sub1        min      2
5          CTL       sub1        min      0
6          Joy       sub1       mean      0
7          Red       sub1       mean      5
8          CTL       sub1       mean      4
9          Joy       sub2        max      8
10         Red       sub2        max      8
11         CTL       sub2        max      4
12         Joy       sub2        min      3
13         Red       sub2        min      3
14         CTL       sub2        min      1
15         Joy       sub2       mean      1
16         Red       sub2       mean      4
17         CTL       sub2       mean      7
18         Joy       subN        max      7
19         Red       subN        max      7
20         CTL       subN        max      7
21         Joy       subN        min      2
22         Red       subN        min      1
23         CTL       subN        min      2
24         Joy       subN       mean      0
25         Red       subN       mean      0
26         CTL       subN       mean      5

But if Student is index:

print (df)
    sub1          sub2          subN         
     max min mean  max min mean  max min mean
Joy    7   2    0    8   3    1    7   2    0
Red    9   2    5    8   3    4    7   1    0
CTL    5   0    4    4   1    7    7   2    5

print (df.columns[:5])
MultiIndex([('sub1',  'max'),
            ('sub1',  'min'),
            ('sub1', 'mean'),
            ('sub2',  'max'),
            ('sub2',  'min')],
           )

Need ignore_index=False:

df2 = df.melt(ignore_index=False)
print (df2)
    variable_0 variable_1  value
Joy       sub1        max      7
Red       sub1        max      9
CTL       sub1        max      5
Joy       sub1        min      2
Red       sub1        min      2
CTL       sub1        min      0
Joy       sub1       mean      0
Red       sub1       mean      5
CTL       sub1       mean      4
Joy       sub2        max      8
Red       sub2        max      8
CTL       sub2        max      4
Joy       sub2        min      3
Red       sub2        min      3
CTL       sub2        min      1
Joy       sub2       mean      1
Red       sub2       mean      4
CTL       sub2       mean      7
Joy       subN        max      7
Red       subN        max      7
CTL       subN        max      7
Joy       subN        min      2
Red       subN        min      1
CTL       subN        min      2
Joy       subN       mean      0
Red       subN       mean      0
CTL       subN       mean      5
  • Related