Home > database >  Multiindex Dataframe Merge Columns together
Multiindex Dataframe Merge Columns together

Time:07-19

I have a Dataframe in the following format:

                                    2022-07-04                             2022-07-11
                                    27          27.1        27.2           28          28.1         28.2
index      id           employee                
1          1234565      Max         project1    customer1   in progress    project1    customer1    in progress
                                    project2    customer2   in progress    project2    customer2    in progress

What I want to achieve is the following format:

                                    2022-07-04                          2022-07-11
                                    27                                  28          
index      id           employee                
1          1234565      Max         project1, customer1, in progress    project1, customer1, in progress
                                    project2, customer2, in progress    project2, customer2, in progress
                                           

So I have a multilevel Index, and I want to merge the columns to one column, which are under the Top Level column. So for example under the top level '2022-07-04' I have three columns (27, 27.1, 27.2) and I want to merge these 3 columns to one column, and the same for the next date (2022-07-11) and so on... The size is always 3 columns, so below top level are always 3 columns.

CodePudding user response:

Use Index.get_level_values for columns names by levels, in second remove values after . and use custom lambda function for join:

c1 = df.columns.get_level_values(1).str.split('.').str[0]
c2 = df.columns.get_level_values(0)

df = df.groupby([c2, c1], axis=1).apply(lambda x: x.agg(', '.join, axis=1))
print (df)
                                              2022-07-04  \
                                                      27   
index id      employee                                     
1     1234565 Max       project1, customer1, in progress   
              Max       project2, customer2, in progress   

                                              2022-07-11  
                                                      28  
index id      employee                                    
1     1234565 Max       project1, customer1, in progress  
              Max       project2, customer2, in progress  

CodePudding user response:

It is a bit hard for me to understand the datastructure. I guess you could create a new array in the shape which you want, say

newArray = np.zeros((5,2),dtype = object)

and then go through the array with a for-loop. And for each column set the newArray-column to the one of the original array, except at column 4 and 5, where you can use sth like np.append or str.join to link the remaining entries together.

  • Related