Home > Back-end >  How to add an indexing level to a subset of columns in a dataframe in pandas
How to add an indexing level to a subset of columns in a dataframe in pandas

Time:06-15

I found how to add levels to all the columns but not on specific subsets.

I explain my problem: say that I have a pandas dataframe like this

            a     b     c      d
myIndex    
  0.0      0.1   0.2  -2.0   -0.8
  0.1      0.7   1.1   9.0    0.8
  0.2     -0.3   1.0   2.3   -0.6

and that I want to add an additional column index level based on columns subsets, e.g. I want something like this:

                A          B
             a     b     c      d
myIndex    
  0.0      0.1   0.2  -2.0   -0.8
  0.1      0.7   1.1   9.0    0.8
  0.2     -0.3   1.0   2.3   -0.6

How to do that?

CodePudding user response:

Maybe something like Multiindex can help:

df = pd.DataFrame(np.random.randn(8, 4), columns=pd.MultiIndex.from_tuples([
    ('A', 'a'), ('A', 'b'), ('B', 'c'), ('B', 'd')], names=['1st', '2nd']))

OUTPUT

1st         A                   B
2nd         a         b         c         d
0    1.187084  0.286291  0.151277  0.111996
1    0.383498  0.282514 -2.037380 -0.792589
2    0.193105 -0.421383 -0.956383 -0.667760
3   -1.646689  0.836364  0.270066 -1.667103
4    0.315428 -1.478783  1.442501  0.362000
5    0.763970  0.028874  0.149660  0.726050
6    0.617114 -0.142764 -1.179998 -0.240495
7    0.437346 -0.545978 -0.800455 -0.117596

CodePudding user response:

General Solution

Define a mapping dictionary which maps the level zero column values to level 1 column values, then flatten the dictionary into tuples and create a multiindex

d = {'A': ('a', 'b'), 'B': ('c', 'd')}
df.columns = pd.MultiIndex.from_tuples((k, c) for k, v in d.items() for c in v)

           A         B     
           a    b    c    d
myIndex                    
0.0      0.1  0.2 -2.0 -0.8
0.1      0.7  1.1  9.0  0.8
0.2     -0.3  1.0  2.3 -0.6

CodePudding user response:

In case someone needs it, I solved in this way. I searched quite a lot and mine does not seem an answered question.

More elegant answers than this are welcomed.

col_subset_1 = ['a', 'b']
col_subset_2 = ['c', 'd']

col_subset_1_label = list(zip(['A']*len(col_subset_1), col_subset_1))
col_subset_2_label = list(zip(['B']*len(col_subset_2), col_subset_2))

df.columns = pd.MultiIndex.from_tuples([*col_subset_1_label,*col_subset_1_label])

This solution should be pretty flexible and it should work with every columns subsets (provided that the selected columns exist in the current dataframe).

  • Related