here is an example df, as below:
df1 = pd.DataFrame(
data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
"indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
"unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
"year": ["2017", "2018","2019", "2017","2018","2019"],
"country1": [1, 2, 3, 2, 4, 6],
"country2": [4, 5, 6, 40, 50, 60]}
)
df1MultiIndex = df1.pivot_table(
index=['data_provider', 'indicator', 'unit'],
columns='year'
)
df1MultiIndex.columns.names = ['Country', 'Year']
to view df1MultiIndex
, it shows as
Country country1 country2
Year 2017 2018 2019 2017 2018 2019
data_provider indicator unit
prov_a ind_a EUR 1 2 3 4 5 6
ind_b EUR 2 4 6 40 50 60
if you check its column index, it shows as
MultiIndex([('country1', '2017'),
('country1', '2018'),
('country1', '2019'),
('country2', '2017'),
('country2', '2018'),
('country2', '2019')],
names=['Country', 'Year'])
here comes my question:
how can I get the column labels by column names?
For example, I can get a list like ['country1', 'country1', 'country1', 'country2', 'country2', 'country2']
by ‘Country'. Or get a list-likst ['country1', 'country2']
by Country
.
df.columns.get_level_values(level=0)
may also achieve my goal, but I wish to get the same result by column names, especially when you are dealing with a df having a bunch of columns and have no idea of its level but you do have rename column names.
CodePudding user response:
See the documentation for pd.MultiIndex.get_level_values(level)
:
level
is either the integer position of the level in the MultiIndex, or the name of the level.
So, this:
df1MultiIndex.columns.get_level_values(level=0)
and:
df1MultiIndex.columns.get_level_values(level='Country')
give the same result:
Index(['country1', 'country1', 'country1', 'country2', 'country2', 'country2'],
dtype='object', name='Country')
df.rename
allows the same choice between integer position and name. E.g.
df_renamed_country = df1MultiIndex.rename(columns=str.upper, level='Country')
df_renamed_country.columns
MultiIndex([('COUNTRY1', '2017'),
('COUNTRY1', '2018'),
('COUNTRY1', '2019'),
('COUNTRY2', '2017'),
('COUNTRY2', '2018'),
('COUNTRY2', '2019')],
names=['Country', 'Year'])