Home > Enterprise >  How to Multi-Index an existing DataFrame
How to Multi-Index an existing DataFrame

Time:12-14

"Multi-Index" might be the incorrect term of what I'm looking to do, but below is an example of what I'm trying to accomplish.

Original DF:

 HOD       site1_units     site1_orders    site2_units   site2_orders  
hour1         6                3               20             16
hour2         25               10              16             3
hour3         500              50              50             25 
hour4         125              65              59             14  
hour5         16               1               158            6 
hour6         0                0               15             15
hour7         180              18              99             90

Desired DF

                     site1                           site2
 HOD         units          orders           units          orders  
hour1         6                3               20             16
hour2         25               10              16             3
hour3         500              50              50             25 
hour4         125              65              59             14  
hour5         16               1               158            6 
hour6         0                0               15             15
hour7         180              18              99             90

Is there an efficient way to construct/format the dataframe like this? Thank you for the help!

CodePudding user response:

Try this:

df = df.set_index('HOD')
df = df.set_axis(df.columns.map(lambda x: tuple(x.split('_'))),axis=1)

Output:

      site1        site2       
      units orders units orders
HOD                            
hour1     6      3    20     16
hour2    25     10    16      3
hour3   500     50    50     25
hour4   125     65    59     14
hour5    16      1   158      6
hour6     0      0    15     15
hour7   180     18    99     90

CodePudding user response:

Here is one way you can do this:

df = df.set_index("HOD")
index = pd.MultiIndex.from_tuples(zip(["site1","site1", "site2", "site2"],["units", "orders", "units", "orders"]))
df.columns = index

Result:

      site1        site2
      units orders units orders
HOD
hour1     6      3    20     16
hour2    25     10    16      3
hour3   500     50    50     25
hour4   125     65    59     14
hour5    16      1   158      6
hour6     0      0    15     15
hour7   180     18    99     90
  • Related