Home > Enterprise >  How to append pandas dataframes with different shapes
How to append pandas dataframes with different shapes

Time:11-14

I am trying to append two pandas DataFrames that are of different shape. Here are sample DataFrames for reproducibility:

import pandas as pd

df1 = pd.DataFrame({'id': [1,2,3,4],
                    'val': ['x','y','w','z'],
                    'val1': ['x1','y1','w1','z1']
                   })

df2 = pd.DataFrame({'id': [5,6,7,8],
                    'val2': ['x1','y1','w1','z1'],
                    'val': ['t','s','v','l'],
                   })

I'd like to append df2 to df1. Expected behavior: Non-matching columns, in this case val2 would just be dropped. Retain column ordering of df1 and reset index in appended DataFrame.

   id  val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   v1
4   5   t   NaN
5   6   s   NaN
6   7   v   NaN
7   8   l   NaN

To clarify, not looking for inner join. Need all the columns from df1 and additionally, also print the columns that didn't intersect between the dataframes.

CodePudding user response:

You can use pd.concat (Edited to match your excepted outout):

pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')

The key element here is to use inner join rather than outer which is the default one and will create NaN values (because it performs a union opeartion) for the columns you didn't want to drop.

output:

   id  val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   z1
4   5   t   NaN
5   6   s   NaN
6   7   v   NaN
7   8   l   NaN

CodePudding user response:

You can do the following:

>>> df1.append(df2, ignore_index=True).reindex(df1.columns, axis='columns')

   id val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   z1
4   5   t  NaN
5   6   s  NaN
6   7   v  NaN
7   8   l  NaN

CodePudding user response:

You can do:

# get columns present in both dataframes
shared_columns=set(df1.columns).intersection(df2.columns)

# append dataframes
resdf = df1[shared_columns].append(df2[shared_columns])

# reset index & don't save old one as a column
resdf.reset_index(inplace=True,drop=True)

resdf will be:

   id val
0   1   x
1   2   y
2   3   w
3   4   z
4   5   t
5   6   s
6   7   v
7   8   l

CodePudding user response:

Use pandas.concat and then drop the column 'val2' using DataFrame.drop

import pandas as pd

res = pd.concat([df1, df2], ignore_index=True).drop(columns='val2')

>>> res

   id val val1
0   1   x   x1
1   2   y   y1
2   3   w   w1
3   4   z   z1
4   5   t  NaN
5   6   s  NaN
6   7   v  NaN
7   8   l  NaN
  • Related