I have a pandas dataframe with column names like:
id name ... class maths_marks1 maths_marks2 eng_marks1 eng_marks2
Now I want to create Multiindex columns like: first column is id
so it remains as is, then maths
and eng
in level0 while marks1
and marks2
in level1, and the rest of the columns under student_data
. So my final dataframe should look like:
student_data maths eng
id name ... class marks1 marks2 marks1 marks2
Do note that ...
indicate that there are a lot of columns between name
and class
, therefore I would like to specify them by position and not name (something like starting from the second column to the column just before the one which contains maths
).
What's the best way to achieve this?
CodePudding user response:
Prepend student_data_
in each col which is not contain '_' and split columns to convert them as MultiIndex
:
df.columns = ['student_data_' col if '_' not in col else col
for col in df.rename(columns={'id': '_id'}).columns]
df.columns = df.columns.str.rsplit('_', 1).map(tuple)
# Before
>>> df
id name class maths_marks1 maths_marks2 eng_marks1 eng_marks2
0 1 Louis CP 20 19 18 17
# After
>>> df
student_data maths eng
id name class marks1 marks2 marks1 marks2
0 1 Louis CP 20 19 18 17