Home > Enterprise >  get columns labels by column names in dataframe
get columns labels by column names in dataframe

Time:12-18

here is an example df, as below:

df1 = pd.DataFrame(
    data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
          "indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
          "unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
          "year": ["2017", "2018","2019", "2017","2018","2019"],
          "country1": [1, 2, 3, 2, 4, 6],
          "country2": [4, 5, 6, 40, 50, 60]}
)

df1MultiIndex = df1.pivot_table(
    index=['data_provider', 'indicator', 'unit'],
    columns='year'
)
df1MultiIndex.columns.names = ['Country', 'Year']

to view df1MultiIndex, it shows as

Country                      country1           country2          
Year                             2017 2018 2019     2017 2018 2019
data_provider indicator unit                                      
prov_a        ind_a     EUR         1    2    3        4    5    6
              ind_b     EUR         2    4    6       40   50   60

if you check its column index, it shows as

MultiIndex([('country1', '2017'),
            ('country1', '2018'),
            ('country1', '2019'),
            ('country2', '2017'),
            ('country2', '2018'),
            ('country2', '2019')],
           names=['Country', 'Year'])

here comes my question: how can I get the column labels by column names? For example, I can get a list like ['country1', 'country1', 'country1', 'country2', 'country2', 'country2'] by ‘Country'. Or get a list-likst ['country1', 'country2'] by Country.

df.columns.get_level_values(level=0) may also achieve my goal, but I wish to get the same result by column names, especially when you are dealing with a df having a bunch of columns and have no idea of its level but you do have rename column names.

CodePudding user response:

See the documentation for pd.MultiIndex.get_level_values(level):

level is either the integer position of the level in the MultiIndex, or the name of the level.

So, this:

df1MultiIndex.columns.get_level_values(level=0)

and:

df1MultiIndex.columns.get_level_values(level='Country')

give the same result:

Index(['country1', 'country1', 'country1', 'country2', 'country2', 'country2'], 
      dtype='object', name='Country')

df.rename allows the same choice between integer position and name. E.g.

df_renamed_country = df1MultiIndex.rename(columns=str.upper, level='Country')
df_renamed_country.columns

MultiIndex([('COUNTRY1', '2017'),
            ('COUNTRY1', '2018'),
            ('COUNTRY1', '2019'),
            ('COUNTRY2', '2017'),
            ('COUNTRY2', '2018'),
            ('COUNTRY2', '2019')],
           names=['Country', 'Year'])
  • Related