I have 2 dataframes df_Participants and df_Movements and I want to keep the rows in df_Movements
only if a participant is on in df_Participants
.
df_Participants:
id
0 1053.0
1 1052.0
2 1049.0
df_Movements
id participant
0 3902 1053
1 3901 1053
611 2763 979
612 2762 979
Expected results:
id participant
0 3902 1053
1 3901 1053
what I have tried so far:
remove_incomplete_submissions = True
if remove_incomplete_submissions:
df_Movements = df_Movements.loc[df_Movements['participant'].isin(df_Participants['id'])]
When I check the number of unique participants, it does not match. I know this is simple, but I can't seem to notice the issue here.
CodePudding user response:
You can use merge:
new_df = df_Participants.merge(df_Movements, how='left', left_on='id', right_on='participant')
CodePudding user response:
Use isin
to create a boolean mask and get rows that match the condition:
>>> df_Movements[df_Movements['participant'].isin(df_Participants['id'])]
id participant
0 3902 1053
1 3901 1053
Edit: as suggested too by @Ben.T in comments