Home > other >  Concatenate columns at the end of a MultiIndex columns DataFrame
Concatenate columns at the end of a MultiIndex columns DataFrame

Time:11-03

Consider the following DataFrames df :

df = 
kind                            A               B
names          u1      u2      u3      y1      y2
Time
0.0        0.5083  0.1007  0.8001  0.7373  0.1387
0.1        0.6748  0.0354  0.0076  0.8421  0.2670
0.2        0.1753  0.1013  0.5231  0.8060  0.0040
0.3        0.5953  0.6505  0.7127  0.0771  0.1023
0.4        0.4409  0.0193  0.6765  0.9800  0.0715

and df1:

df1 = 
kind            A      
names      potato        
Time
0.0        0.4043  
0.1        0.9801 
0.2        0.1298  
0.3        0.9564  
0.4        0.4409  

I want to concatenate the two DataFrames such that the resulting DataFrame is:

df2 = 
kind                                    A               B
names          u1      u2      u3  potato      y1      y2
Time
0.0        0.5083  0.1007  0.8001  0.5083  0.7373  0.1387
0.1        0.6748  0.0354  0.0076  0.6748  0.8421  0.2670
0.2        0.1753  0.1013  0.5231  0.1753  0.8060  0.0040
0.3        0.5953  0.6505  0.7127  0.5953  0.0771  0.1023
0.4        0.4409  0.0193  0.6765  0.4409  0.9800  0.0715

What I run is pandas.concat([df1, df2, axis=1).sort_index(level="kind", axis=1) but that results in

kind                                    A               B
names      potato      u1      u2      u3      y1      y2
Time
0.0        0.4043  0.5083  0.1007  0.8001  0.7373  0.1387
0.1        0.9801  0.6748  0.0354  0.0076  0.8421  0.2670
0.2        0.1298  0.1753  0.1013  0.5231  0.8060  0.0040
0.3        0.9564  0.5953  0.6505  0.7127  0.0771  0.1023
0.4        0.4409  0.4409  0.0193  0.6765  0.9800  0.0715

i.e. the column potato is appended at the beginning of df["A"] whereas I want it appended to the end.

CodePudding user response:

Add parameter sort_remaining=False in DataFrame.sort_index:

df = pd.concat([df1, df2], axis=1).sort_index(level="kind", axis=1, sort_remaining=False)
print (df)
kind        A                               B        
names      u1      u2      u3  potato      y1      y2
Time                                                 
0.0    0.5083  0.1007  0.8001  0.4043  0.7373  0.1387
0.1    0.6748  0.0354  0.0076  0.9801  0.8421  0.2670
0.2    0.1753  0.1013  0.5231  0.1298  0.8060  0.0040
0.3    0.5953  0.6505  0.7127  0.9564  0.0771  0.1023
0.4    0.4409  0.0193  0.6765  0.4409  0.9800  0.0715

CodePudding user response:

There are two possible solutions to this problem, one safer than the other. Both rely on the use of the df.insert paradigm, however the 1st one is simple and dangerous, while the second one slightly more complicated, but allows for more "security".

Solution 1

You can simply use the df.insert command, which allows to insert a column at a specified index. In your case, that would mean:

df.insert(loc=3, column=df1.columns[0], value=df1.values)

Which gives the desired output:

kind    A   B
names   u1  u2  u3  Potato  y1  y2
Time                        
0.0 0.5083  0.1007  0.8001  0.4043  0.7373  0.1387
0.1 0.6748  0.0354  0.0076  0.9801  0.8421  0.2670
0.2 0.1753  0.1013  0.5231  0.1298  0.8060  0.0040
0.3 0.5953  0.6505  0.7127  0.9564  0.0771  0.1023
0.4 0.4409  0.0193  0.6765  0.4409  0.9800  0.0715

However, it is not really a merge, and could easily give exceptions. For example, if df1 has a different length, it will give an Exception of the type (in the example has 1 less value).

ValueError: Length of values (4) does not match length of index (5)

This merging is also not performed with respect to the index, i.e., the index of df1 is completely ignored in the merging.

I would suggest to use this only if your are certain of the type of data your are merging.

Solution 2

First performing a concatenate and then an insert allows to account for different indexes. This can be done like this:

df2 = pd.concat([df, df1], axis=1)
colname: tuple = ('A', 'Potato')
colvalue: pd.Series = df2[colname]
df2.pop(colname)
df2.insert(loc=3, column=colname, value=colvalue)

Indeed, if df1 has a different index, this method allows to account for that.

Example

In order to explain the difference in the two solutions, let us consider the case in which df and df1 have different indexes. That is:

display(df)
kind    A   B
names   u1  u2  u3  y1  y2
Time                    
0.0 0.5083  0.1007  0.8001  0.7373  0.1387
0.1 0.6748  0.0354  0.0076  0.8421  0.2670
0.2 0.1753  0.1013  0.5231  0.8060  0.0040
0.3 0.5953  0.6505  0.7127  0.0771  0.1023
0.4 0.4409  0.0193  0.6765  0.9800  0.0715

and

display(df1)
kind    A
names   Potato
0.0 0.4043
0.1 0.9801
0.2 0.1298
0.3 0.9564
0.5 0.4409

In this scenario, Solution 1 will lead to the following merge:

df.insert(loc=3, column=df1.columns[0], value=df1.values)
display(df)
kind    A   B
names   u1  u2  u3  Potato  y1  y2
Time                        
0.0 0.5083  0.1007  0.8001  0.4043  0.7373  0.1387
0.1 0.6748  0.0354  0.0076  0.9801  0.8421  0.2670
0.2 0.1753  0.1013  0.5231  0.1298  0.8060  0.0040
0.3 0.5953  0.6505  0.7127  0.9564  0.0771  0.1023
0.4 0.4409  0.0193  0.6765  0.4409  0.9800  0.0715

which would be wrong. Solution 2, however, would lead to the correct output:

df2 = pd.concat([df, df1], axis=1)
colname: tuple = ('A', 'Potato')
colvalue: pd.Series = df2[colname]
df2.pop(colname)
df2.insert(loc=3, column=colname, value=colvalue)
display(df2)
kind    A   B
names   u1  u2  u3  Potato  y1  y2
0.0 0.5083  0.1007  0.8001  0.4043  0.7373  0.1387
0.1 0.6748  0.0354  0.0076  0.9801  0.8421  0.2670
0.2 0.1753  0.1013  0.5231  0.1298  0.8060  0.0040
0.3 0.5953  0.6505  0.7127  0.9564  0.0771  0.1023
0.4 0.4409  0.0193  0.6765  NaN 0.9800  0.0715
0.5 NaN NaN NaN 0.4409  NaN NaN

Recap best solution

Use concat and then use pop and insert the new column. If in a hurry, and sure of what you're doing, you can go ahead and use just `insert, but be careful.

  • Related