How to append pandas dataframes with different shapes-CodePudding

I am trying to append two pandas DataFrames that are of different shape. Here are sample DataFrames for reproducibility:

import pandas as pd

df1 = pd.DataFrame({'id': [1,2,3,4],
                    'val': ['x','y','w','z'],
                    'val1': ['x1','y1','w1','z1']
                   })

df2 = pd.DataFrame({'id': [5,6,7,8],
                    'val2': ['x1','y1','w1','z1'],
                    'val': ['t','s','v','l'],
                   })

I'd like to append df2 to df1. Expected behavior: Non-matching columns, in this case val2 would just be dropped. Retain column ordering of df1 and reset index in appended DataFrame.

   id  val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   v1
4   5   t   NaN
5   6   s   NaN
6   7   v   NaN
7   8   l   NaN

To clarify, not looking for inner join. Need all the columns from df1 and additionally, also print the columns that didn't intersect between the dataframes.

CodePudding user response：

You can use pd.concat (Edited to match your excepted outout):

pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')

The key element here is to use inner join rather than outer which is the default one and will create NaN values (because it performs a union opeartion) for the columns you didn't want to drop.

output:

   id  val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   z1
4   5   t   NaN
5   6   s   NaN
6   7   v   NaN
7   8   l   NaN

CodePudding user response：

You can do the following:

>>> df1.append(df2, ignore_index=True).reindex(df1.columns, axis='columns')

   id val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   z1
4   5   t  NaN
5   6   s  NaN
6   7   v  NaN
7   8   l  NaN

CodePudding user response：

You can do:

# get columns present in both dataframes
shared_columns=set(df1.columns).intersection(df2.columns)

# append dataframes
resdf = df1[shared_columns].append(df2[shared_columns])

# reset index & don't save old one as a column
resdf.reset_index(inplace=True,drop=True)

resdf will be:

CodePudding user response：

Use pandas.concat and then drop the column 'val2' using DataFrame.drop

import pandas as pd

res = pd.concat([df1, df2], ignore_index=True).drop(columns='val2')

>>> res

   id val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   z1
4   5   t  NaN
5   6   s  NaN
6   7   v  NaN
7   8   l  NaN