I've got 2 data frames like this,
df1:
columnA columnB columnC
data1 2 d
data1 2 d
data1 2 d
data2 3 r
data2 3 r
data3 4 g
data3 4 g
df2:
columnA columnD columnE
data1 a x
data1 a x
data1 a x
data2 b y
data2 b y
After merging I want the output like below,
Expected output:
columnA columnB columnC columnD columnE
data1 2 d a x
data1 2 d a x
data1 2 d a x
data2 3 r b y
data2 3 r b y
data3 4 g
data3 4 g
Is there a way to do this?
I've tried,
combined = pd.merge(df1, df2, how="inner", on="columnA")
but this results in more rows than originally.
CodePudding user response:
You first need to remove the duplicates from your second dataframe:
df2_nodup = df2.drop_duplicates()
Now you can use pd.merge()
to create the desired output:
>>> pd.merge(df1, df2_nodup, how='left')
columnA columnB columnC columnD columnE
0 data1 2 d a x
1 data1 2 d a x
2 data1 2 d a x
3 data2 3 r b y
4 data2 3 r b y
5 data3 4 g NaN NaN
6 data3 4 g NaN NaN
CodePudding user response:
You can achieve that by simply using
Documentation: pandas.concat() concatenate pandas objects along a particular axis with optional set logic along the other axes. Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.