Home > front end >  merge two pandas dataframes without duplicating items
merge two pandas dataframes without duplicating items

Time:12-04

I have two dataframes

left = pd.DataFrame(
    {"K": ["K0", "K0", "K1", "K2"], "A": ["A0", "A1", "A2", "A3"], "B": ["B0", "B1", "B2", "B3"]}
)

right = pd.DataFrame(
    {"K": ["K0", "K0", "K0", "K2"], "C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]}
)

    K   A   B
0  K0  A0  B0
1  K0  A1  B1
2  K1  A2  B2
3  K2  A3  B3

    K   C   D
0  K0  C0  D0
1  K0  C1  D1
2  K0  C2  D2
3  K2  C3  D3

I want to combine them into one dataframe, so that only the rows are considered where the key is present in both dataframes:

   K   A   B   C   D
0  K0  A0  B0  C0  D0
1  K0  A1  B1  C1  D1
2  K2  A2  B2  C3  D3

If I use pandas.merge I always end up with duplicate rows. Is there a simple way to achieve this?

CodePudding user response:

Enumerate the keys, then merge on both key and enumeration:

(left.assign(enum=left.groupby('K').cumcount())
     .merge(right.assign(enum=right.groupby('K').cumcount()),
            on=['K', 'enum'])
     .drop('enum', axis=1)
)

Output:

    K   A   B   C   D
0  K0  A0  B0  C0  D0
1  K0  A1  B1  C1  D1
2  K2  A3  B3  C3  D3
  • Related