I have two dataframes
left = pd.DataFrame(
{"K": ["K0", "K0", "K1", "K2"], "A": ["A0", "A1", "A2", "A3"], "B": ["B0", "B1", "B2", "B3"]}
)
right = pd.DataFrame(
{"K": ["K0", "K0", "K0", "K2"], "C": ["C0", "C1", "C2", "C3"], "D": ["D0", "D1", "D2", "D3"]}
)
K A B
0 K0 A0 B0
1 K0 A1 B1
2 K1 A2 B2
3 K2 A3 B3
K C D
0 K0 C0 D0
1 K0 C1 D1
2 K0 C2 D2
3 K2 C3 D3
I want to combine them into one dataframe, so that only the rows are considered where the key is present in both dataframes:
K A B C D
0 K0 A0 B0 C0 D0
1 K0 A1 B1 C1 D1
2 K2 A2 B2 C3 D3
If I use pandas.merge I always end up with duplicate rows. Is there a simple way to achieve this?
CodePudding user response:
Enumerate the keys, then merge on both key and enumeration:
(left.assign(enum=left.groupby('K').cumcount())
.merge(right.assign(enum=right.groupby('K').cumcount()),
on=['K', 'enum'])
.drop('enum', axis=1)
)
Output:
K A B C D
0 K0 A0 B0 C0 D0
1 K0 A1 B1 C1 D1
2 K2 A3 B3 C3 D3