Home > Software design >  How to drop rows from one dataframe if there is no match in another dataframe based on common column
How to drop rows from one dataframe if there is no match in another dataframe based on common column

Time:09-21

I have looked through the forum but still can't figure this out.

I am trying to drop rows from a second dataframe if they do not exist in the first dataframe. The condition is based on the r_id column where the ids in that row need to exist in the first dataframe.

I have tried this:

df2 = pd.merge(df2, df1[['r_id']] , on=['r_id'], how='inner')

However, there are still rows (r_ids) that exist in df2 that are not in df1.

df1 looks like this:

r_id  tmp  meds hr
4968    2   0   0
4968    2   0   0
4968    2   0   0
4968    2   0   0
4968    2   0   0


df2 looks like this:

r_id    date
4968    02/08/2020
4968    02/08/2020
4968    02/08/2020
4968    31/10/2020
4968    31/10/2020
4968    31/10/2020
3245    20/12/2020
3245    20/12/2020

however, df2 should look like this:

r_id    date
4968    02/08/2020
4968    02/08/2020
4968    02/08/2020
4968    31/10/2020
4968    31/10/2020
4968    31/10/2020

Any idea on what to do?

CodePudding user response:

Can you try with

df2 = df2[df2.r_id.isin(df1.r_id),]

CodePudding user response:

Firstly, ensure your key is in the correct datatype.

df1.r_id = df1.r_id.astype(str)
df2.r_id = df2.r_id.astype(str)

Then you filter your dataframe with the unique ids.

df2 = df2[df2.r_id.isin(df1.r_id.unique().tolist())]

CodePudding user response:

data1 = {
'r_id': ['4968', '4968', '4968', '4968', '4968'],
'tmp': [2, 2, 2, 2, 2],
'meds': [0, 0, 0, 0, 0],
'hr': [0, 0, 0, 0, 0],
}

data2 = {
'r_id': ['4968', '4968', '4968', '4968', '4968', '4968', '3245', '3245'],
'date': ['02/08/2020', '02/08/2020', '02/08/2020', '31/10/2020', '31/10/2020', '31/10/2020', '20/12/2020', '20/12/2020'],
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df2_copy = df2.copy()

df2 = [df2.r_id.isin(set(df1.r_id))]

# Check the difference between the new df2 and the original one.
print(df2_copy)
print(df2)
  • Related