I would like a support on doing melt without providing all the column names. I have a dataframe with 5000 columns and it is quite dynamic. Hence, would like to see if there is any way to do melt without providing columns (using df.columns
).
# Sample df
#| sub1 sub2 subN
#| Student max min mean max min mean max min mean
#| 0 Joy 7 2 0 8 3 1 7 2 0
#| 1 Red 9 2 5 8 3 4 7 1 0
#| 2 CTL 5 0 4 4 1 7 7 2 5
I would like to have
# Sample df
#|
#| Student Sub type
#| 0 Joy sub1 max 7
#| 0 Joy sub1 min 2
#| 0 Joy sub1 mean 0
#| 0 Joy sub2 max 8
#| 0 Joy sub2 min 3
#| 0 Joy sub2 mean 1
#| 0 Joy subN max 7
#| 0 Joy subN min 2
#| 0 Joy subN mean 0
#| 1 Red ........
CodePudding user response:
Nevermind. Found out just using melt()
without any arguments did the trick
CodePudding user response:
It depends of data:
print (df)
sub1 sub2 subN
Student max min mean max min mean max min mean
0 Joy 7 2 0 8 3 1 7 2 0
1 Red 9 2 5 8 3 4 7 1 0
2 CTL 5 0 4 4 1 7 7 2 5
Here is not clear how looks first column, so test it:
print (df.columns[:5])
MultiIndex([( '', 'Student'),
('sub1', 'max'),
('sub1', 'min'),
('sub1', 'mean'),
('sub2', 'max')],
)
Incorrect ouput:
df2 = df.melt()
print (df2)
variable_0 variable_1 value
0 Student Joy
1 Student Red
2 Student CTL
3 sub1 max 7
4 sub1 max 9
5 sub1 max 5
6 sub1 min 2
7 sub1 min 2
8 sub1 min 0
9 sub1 mean 0
10 sub1 mean 5
11 sub1 mean 4
12 sub2 max 8
13 sub2 max 8
14 sub2 max 4
15 sub2 min 3
16 sub2 min 3
17 sub2 min 1
18 sub2 mean 1
19 sub2 mean 4
20 sub2 mean 7
21 subN max 7
22 subN max 7
23 subN max 7
24 subN min 2
25 subN min 1
26 subN min 2
27 subN mean 0
28 subN mean 0
29 subN mean 5
Because Student
is part of MultiIndex
:
df1 = df.melt(id_vars = list(df.columns[:1]))
print (df1)
(, Student) variable_0 variable_1 value
0 Joy sub1 max 7
1 Red sub1 max 9
2 CTL sub1 max 5
3 Joy sub1 min 2
4 Red sub1 min 2
5 CTL sub1 min 0
6 Joy sub1 mean 0
7 Red sub1 mean 5
8 CTL sub1 mean 4
9 Joy sub2 max 8
10 Red sub2 max 8
11 CTL sub2 max 4
12 Joy sub2 min 3
13 Red sub2 min 3
14 CTL sub2 min 1
15 Joy sub2 mean 1
16 Red sub2 mean 4
17 CTL sub2 mean 7
18 Joy subN max 7
19 Red subN max 7
20 CTL subN max 7
21 Joy subN min 2
22 Red subN min 1
23 CTL subN min 2
24 Joy subN mean 0
25 Red subN mean 0
26 CTL subN mean 5
But if Student
is index:
print (df)
sub1 sub2 subN
max min mean max min mean max min mean
Joy 7 2 0 8 3 1 7 2 0
Red 9 2 5 8 3 4 7 1 0
CTL 5 0 4 4 1 7 7 2 5
print (df.columns[:5])
MultiIndex([('sub1', 'max'),
('sub1', 'min'),
('sub1', 'mean'),
('sub2', 'max'),
('sub2', 'min')],
)
Need ignore_index=False
:
df2 = df.melt(ignore_index=False)
print (df2)
variable_0 variable_1 value
Joy sub1 max 7
Red sub1 max 9
CTL sub1 max 5
Joy sub1 min 2
Red sub1 min 2
CTL sub1 min 0
Joy sub1 mean 0
Red sub1 mean 5
CTL sub1 mean 4
Joy sub2 max 8
Red sub2 max 8
CTL sub2 max 4
Joy sub2 min 3
Red sub2 min 3
CTL sub2 min 1
Joy sub2 mean 1
Red sub2 mean 4
CTL sub2 mean 7
Joy subN max 7
Red subN max 7
CTL subN max 7
Joy subN min 2
Red subN min 1
CTL subN min 2
Joy subN mean 0
Red subN mean 0
CTL subN mean 5