How to drop rows from one dataframe if there is no match in another dataframe based on common column-CodePudding

I have looked through the forum but still can't figure this out.

I am trying to drop rows from a second dataframe if they do not exist in the first dataframe. The condition is based on the r_id column where the ids in that row need to exist in the first dataframe.

I have tried this:

df2 = pd.merge(df2, df1[['r_id']] , on=['r_id'], how='inner')

However, there are still rows (r_ids) that exist in df2 that are not in df1.

df1 looks like this:

r_id  tmp  meds hr
4968    2   0   0
4968    2   0   0
4968    2   0   0
4968    2   0   0
4968    2   0   0

df2 looks like this:

r_id    date
4968    02/08/2020
4968    02/08/2020
4968    02/08/2020
4968    31/10/2020
4968    31/10/2020
4968    31/10/2020
3245    20/12/2020
3245    20/12/2020

however, df2 should look like this:

r_id    date
4968    02/08/2020
4968    02/08/2020
4968    02/08/2020
4968    31/10/2020
4968    31/10/2020
4968    31/10/2020

Any idea on what to do?

CodePudding user response：

Can you try with

df2 = df2[df2.r_id.isin(df1.r_id),]

CodePudding user response：

Firstly, ensure your key is in the correct datatype.

df1.r_id = df1.r_id.astype(str)
df2.r_id = df2.r_id.astype(str)

Then you filter your dataframe with the unique ids.

df2 = df2[df2.r_id.isin(df1.r_id.unique().tolist())]

CodePudding user response：

data1 = {
'r_id': ['4968', '4968', '4968', '4968', '4968'],
'tmp': [2, 2, 2, 2, 2],
'meds': [0, 0, 0, 0, 0],
'hr': [0, 0, 0, 0, 0],
}

data2 = {
'r_id': ['4968', '4968', '4968', '4968', '4968', '4968', '3245', '3245'],
'date': ['02/08/2020', '02/08/2020', '02/08/2020', '31/10/2020', '31/10/2020', '31/10/2020', '20/12/2020', '20/12/2020'],
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df2_copy = df2.copy()

df2 = [df2.r_id.isin(set(df1.r_id))]

# Check the difference between the new df2 and the original one.
print(df2_copy)
print(df2)