Home > database >  How to add a level to a dataframe in pandas where the new level comes from a column of another dataf
How to add a level to a dataframe in pandas where the new level comes from a column of another dataf

Time:10-30

I have a pandas dataframe that I have named as data_df, Which looks like this.

        A             B             C               D
    a1 a2  a3     b1 b2 b3      c1 c2 c3        d1 d2 d3
1   1   2   3     5   6  7      3  4   5        10  1  8
2   0   5   4     8   7  9      1  5   7        8   6  1

I have another dataframe named universe_df which looks like this

Index Symbol Category
1       A       X
2       B       Y
3       C       Y
4       D       Z

I want the output to look like this:

       X                     Y                    Z
       A               B           C              D
    a1 a2  a3       b1 b2 b3    c1 c2 c3       d1 d2 d3
1   1  2   3         5  6  7     3 4   5        10 1  8
2   0  5   4         8  7  9     1  5  7         8 6  1


How should I go about it? Thanks in advance.

CodePudding user response:

Sample data:

data_df = pd.DataFrame({('A', 'a1'): {1: 1, 2: 0}, ('A', 'a2'): {1: 2, 2: 5}, ('A', 'a3'): {1: 3, 2: 4}, ('B', 'b1'): {1: 5, 2: 8}, ('B', 'b2'): {1: 6, 2: 7}, ('B', 'b3'): {1: 7, 2: 9}, ('C', 'c1'): {1: 3, 2: 1}, ('C', 'c2'): {1: 4, 2: 5}, ('C', 'c3'): {1: 5, 2: 7}, ('D', 'd1'): {1: 10, 2: 8}, ('D', 'd2'): {1: 1, 2: 6}, ('D', 'd3'): {1: 8, 2: 1}})
print (data_df)
   A        B        C         D      
  a1 a2 a3 b1 b2 b3 c1 c2 c3  d1 d2 d3
1  1  2  3  5  6  7  3  4  5  10  1  8
2  0  5  4  8  7  9  1  5  7   8  6  1


universe_df = pd.DataFrame({'Symbol': {1: 'A', 2: 'B', 3: 'C', 4: 'D'}, 'Category': {1: 'X', 2: 'Y', 3: 'Y', 4: 'Z'}})
print (universe_df)
  Symbol Category
1      A        X
2      B        Y
3      C        Y
4      D        Z

First check if same ouput, no trailing whitespaces:

print (universe_df)
MultiIndex([('A', 'a1'),
            ('A', 'a2'),
            ('A', 'a3'),
            ('B', 'b1'),
            ('B', 'b2'),
            ('B', 'b3'),
            ('C', 'c1'),
            ('C', 'c2'),
            ('C', 'c3'),
            ('D', 'd1'),
            ('D', 'd2'),
            ('D', 'd3')],
           )

print (universe_df['Symbol'].tolist())
['A', 'B', 'C', 'D']

Create Series for mapping first level in list comprehension and recreate MultiIndex:

d = universe_df.set_index('Symbol')['Category']

data_df.columns = pd.MultiIndex.from_tuples([(d.get(a), a, b) for a, b in data_df.columns])
print (data_df)
   X        Y                  Z      
   A        B        C         D      
  a1 a2 a3 b1 b2 b3 c1 c2 c3  d1 d2 d3
1  1  2  3  5  6  7  3  4  5  10  1  8
2  0  5  4  8  7  9  1  5  7   8  6  1

Or you can mapping first level of MultiIndex:

d = universe_df.set_index('Symbol')['Category']

a = data_df.columns.get_level_values(0)
b = data_df.columns.get_level_values(1)

data_df.columns = [a.map(d),a, b]

Another idea with helper DataFrame:

d = universe_df.set_index('Symbol')['Category']

df = data_df.columns.to_frame()
df.insert(0, '', df[0].map(d))

data_df.columns = pd.MultiIndex.from_frame(df)
print (data_df)
   X        Y                  Z      
0  A        B        C         D      
1 a1 a2 a3 b1 b2 b3 c1 c2 c3  d1 d2 d3
1  1  2  3  5  6  7  3  4  5  10  1  8
2  0  5  4  8  7  9  1  5  7   8  6  1
  • Related