I am trying to append two pandas DataFrames that are of different shape. Here are sample DataFrames for reproducibility:
import pandas as pd
df1 = pd.DataFrame({'id': [1,2,3,4],
'val': ['x','y','w','z'],
'val1': ['x1','y1','w1','z1']
})
df2 = pd.DataFrame({'id': [5,6,7,8],
'val2': ['x1','y1','w1','z1'],
'val': ['t','s','v','l'],
})
I'd like to append df2
to df1
. Expected behavior: Non-matching columns, in this case val2
would just be dropped. Retain column ordering of df1
and reset index in appended DataFrame.
id val val1
0 1 x x1
1 2 y y1
2 3 w w1
3 4 z v1
4 5 t NaN
5 6 s NaN
6 7 v NaN
7 8 l NaN
To clarify, not looking for inner join. Need all the columns from df1
and additionally, also print the columns that didn't intersect between the dataframes.
CodePudding user response:
You can use pd.concat
(Edited to match your excepted outout):
pd.concat([df1, df2], ignore_index=True).reindex(df1.columns, axis='columns')
The key element here is to use inner
join rather than outer
which is the default one and will create NaN
values (because it performs a union opeartion) for the columns you didn't want to drop.
output:
id val val1
0 1 x x1
1 2 y y1
2 3 w w1
3 4 z z1
4 5 t NaN
5 6 s NaN
6 7 v NaN
7 8 l NaN
CodePudding user response:
You can do the following:
>>> df1.append(df2, ignore_index=True).reindex(df1.columns, axis='columns')
id val val1
0 1 x x1
1 2 y y1
2 3 w w1
3 4 z z1
4 5 t NaN
5 6 s NaN
6 7 v NaN
7 8 l NaN
CodePudding user response:
You can do:
# get columns present in both dataframes
shared_columns=set(df1.columns).intersection(df2.columns)
# append dataframes
resdf = df1[shared_columns].append(df2[shared_columns])
# reset index & don't save old one as a column
resdf.reset_index(inplace=True,drop=True)
resdf
will be:
id val
0 1 x
1 2 y
2 3 w
3 4 z
4 5 t
5 6 s
6 7 v
7 8 l
CodePudding user response:
Use pandas.concat
and then drop the column 'val2'
using DataFrame.drop
import pandas as pd
res = pd.concat([df1, df2], ignore_index=True).drop(columns='val2')
>>> res
id val val1
0 1 x x1
1 2 y y1
2 3 w w1
3 4 z z1
4 5 t NaN
5 6 s NaN
6 7 v NaN
7 8 l NaN