Home > Mobile >  How to stack two pandas dataframe and not loosing column names?
How to stack two pandas dataframe and not loosing column names?

Time:03-30

I have two very different pandas DataFrames with unique column names. And I want to stack them one under another.Nothing else like "union" or "intersection" needed. Like "copy paste" one under another. And I also don't want to lose any column names.

Df1:

  Col1
1  A    C
2  B    D
3  X    Y

DF2:

    Col2 Col3 Col4 Col5
   1  a   c    x    t
   2  b   d    y    j
   3  b   e    z    k

I tried using pd.concat([df1,df2], axis=0, ignore_index = True) but this resulted in something like this:

  Col1
1  A   C      Nan   Nan   Nan   Nan
2  B   D      Nan   Nan   Nan   Nan
3  X   Y      Nan   Nan   Nan   Nan
4  Nan Nan   Col2  Col3  Col4  Col5
5  Nan Nan    a     c      x     t
6  Nan Nan    b     b      y     j
7  Nan Nan    b     e      z     k

Expected Result:

  Col1
1  A     C     Nan    Nan
2  B     D     Nan    Nan
3  X     Y     Nan    Nan
4 Col2  Col3 Col4   Col5
5  a     c     x      t
6  b     b     y      j
7  b     e     z      k

Original DFs

DF1 enter image description here

DF2 enter image description here

CodePudding user response:

Pandas does not allow for multiple unnamed columns (i.e. columns with the empty string as a name) since column names must be unique. That said, I think this gets close to your desired result.

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Col1':[*'ABX'],'':[*'CDY']})
df2 = pd.DataFrame({'Col2':[*'abb'],'Col3':[*'cde'],'Col4':[*'xyz'],'Col5':[*'tjk']})

mat1 = np.vstack([df1.columns,df1.to_numpy()])
mat2 = np.vstack([df2.columns,df2.to_numpy()])

df = pd.concat([pd.DataFrame(mat1),pd.DataFrame(mat2)])

The resulting dataframe df is

      0     1     2     3
0  Col1         NaN   NaN
1     A     C   NaN   NaN
2     B     D   NaN   NaN
3     X     Y   NaN   NaN
0  Col2  Col3  Col4  Col5
1     a     c     x     t
2     b     d     y     j
3     b     e     z     k

CodePudding user response:

This is quite a weird reshape. You can't directly move the header as row, but a trick is to reset_index on the transpose.

For the concat, set aside the column header and put it back:

cols = df1.columns
(pd.concat([df1.set_axis(range(df1.shape[1]), axis=1),
            df2.T.reset_index().T], ignore_index=True)
   .rename(columns=dict(enumerate(cols))))

output:

   Col1           2     3
0     A     C   NaN   NaN
1     B     D   NaN   NaN
2     X     Y   NaN   NaN
3  Col2  Col3  Col4  Col5
4     a     c     x     t
5     b     d     y     j
6     b     e     z     k
  • Related