Consider the following DataFrames
df
:
df =
kind A B
names u1 u2 u3 y1 y2
Time
0.0 0.5083 0.1007 0.8001 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 0.9800 0.0715
and df1
:
df1 =
kind A
names potato
Time
0.0 0.4043
0.1 0.9801
0.2 0.1298
0.3 0.9564
0.4 0.4409
I want to concatenate the two DataFrames
such that the resulting DataFrame
is:
df2 =
kind A B
names u1 u2 u3 potato y1 y2
Time
0.0 0.5083 0.1007 0.8001 0.5083 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.6748 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.1753 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.5953 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 0.4409 0.9800 0.0715
What I run is pandas.concat([df1, df2, axis=1).sort_index(level="kind", axis=1)
but that results in
kind A B
names potato u1 u2 u3 y1 y2
Time
0.0 0.4043 0.5083 0.1007 0.8001 0.7373 0.1387
0.1 0.9801 0.6748 0.0354 0.0076 0.8421 0.2670
0.2 0.1298 0.1753 0.1013 0.5231 0.8060 0.0040
0.3 0.9564 0.5953 0.6505 0.7127 0.0771 0.1023
0.4 0.4409 0.4409 0.0193 0.6765 0.9800 0.0715
i.e. the column potato
is appended at the beginning of df["A"]
whereas I want it appended to the end.
CodePudding user response:
Add parameter sort_remaining=False
in DataFrame.sort_index
:
df = pd.concat([df1, df2], axis=1).sort_index(level="kind", axis=1, sort_remaining=False)
print (df)
kind A B
names u1 u2 u3 potato y1 y2
Time
0.0 0.5083 0.1007 0.8001 0.4043 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.9801 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.1298 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.9564 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 0.4409 0.9800 0.0715
CodePudding user response:
There are two possible solutions to this problem, one safer than the other. Both rely on the use of the df.insert
paradigm, however the 1st one is simple and dangerous, while the second one slightly more complicated, but allows for more "security".
Solution 1
You can simply use the df.insert
command, which allows to insert a column at a specified index. In your case, that would mean:
df.insert(loc=3, column=df1.columns[0], value=df1.values)
Which gives the desired output:
kind A B
names u1 u2 u3 Potato y1 y2
Time
0.0 0.5083 0.1007 0.8001 0.4043 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.9801 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.1298 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.9564 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 0.4409 0.9800 0.0715
However, it is not really a merge, and could easily give exceptions. For example, if df1 has a different length, it will give an Exception of the type (in the example has 1 less value).
ValueError: Length of values (4) does not match length of index (5)
This merging is also not performed with respect to the index, i.e., the index of df1
is completely ignored in the merging.
I would suggest to use this only if your are certain of the type of data your are merging.
Solution 2
First performing a concatenate and then an insert allows to account for different indexes. This can be done like this:
df2 = pd.concat([df, df1], axis=1)
colname: tuple = ('A', 'Potato')
colvalue: pd.Series = df2[colname]
df2.pop(colname)
df2.insert(loc=3, column=colname, value=colvalue)
Indeed, if df1
has a different index, this method allows to account for that.
Example
In order to explain the difference in the two solutions, let us consider the case in which df
and df1
have different indexes.
That is:
display(df)
kind A B
names u1 u2 u3 y1 y2
Time
0.0 0.5083 0.1007 0.8001 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 0.9800 0.0715
and
display(df1)
kind A
names Potato
0.0 0.4043
0.1 0.9801
0.2 0.1298
0.3 0.9564
0.5 0.4409
In this scenario, Solution 1 will lead to the following merge:
df.insert(loc=3, column=df1.columns[0], value=df1.values)
display(df)
kind A B
names u1 u2 u3 Potato y1 y2
Time
0.0 0.5083 0.1007 0.8001 0.4043 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.9801 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.1298 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.9564 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 0.4409 0.9800 0.0715
which would be wrong. Solution 2, however, would lead to the correct output:
df2 = pd.concat([df, df1], axis=1)
colname: tuple = ('A', 'Potato')
colvalue: pd.Series = df2[colname]
df2.pop(colname)
df2.insert(loc=3, column=colname, value=colvalue)
display(df2)
kind A B
names u1 u2 u3 Potato y1 y2
0.0 0.5083 0.1007 0.8001 0.4043 0.7373 0.1387
0.1 0.6748 0.0354 0.0076 0.9801 0.8421 0.2670
0.2 0.1753 0.1013 0.5231 0.1298 0.8060 0.0040
0.3 0.5953 0.6505 0.7127 0.9564 0.0771 0.1023
0.4 0.4409 0.0193 0.6765 NaN 0.9800 0.0715
0.5 NaN NaN NaN 0.4409 NaN NaN
Recap best solution
Use concat
and then use pop
and insert
the new column. If in a hurry, and sure of what you're doing, you can go ahead and use just `insert, but be careful.