I have looked through the forum but still can't figure this out.
I am trying to drop rows from a second dataframe if they do not exist in the first dataframe. The condition is based on the r_id column where the ids in that row need to exist in the first dataframe.
I have tried this:
df2 = pd.merge(df2, df1[['r_id']] , on=['r_id'], how='inner')
However, there are still rows (r_ids) that exist in df2 that are not in df1.
df1 looks like this:
r_id tmp meds hr
4968 2 0 0
4968 2 0 0
4968 2 0 0
4968 2 0 0
4968 2 0 0
df2 looks like this:
r_id date
4968 02/08/2020
4968 02/08/2020
4968 02/08/2020
4968 31/10/2020
4968 31/10/2020
4968 31/10/2020
3245 20/12/2020
3245 20/12/2020
however, df2 should look like this:
r_id date
4968 02/08/2020
4968 02/08/2020
4968 02/08/2020
4968 31/10/2020
4968 31/10/2020
4968 31/10/2020
Any idea on what to do?
CodePudding user response:
Can you try with
df2 = df2[df2.r_id.isin(df1.r_id),]
CodePudding user response:
Firstly, ensure your key is in the correct datatype.
df1.r_id = df1.r_id.astype(str)
df2.r_id = df2.r_id.astype(str)
Then you filter your dataframe with the unique ids.
df2 = df2[df2.r_id.isin(df1.r_id.unique().tolist())]
CodePudding user response:
data1 = {
'r_id': ['4968', '4968', '4968', '4968', '4968'],
'tmp': [2, 2, 2, 2, 2],
'meds': [0, 0, 0, 0, 0],
'hr': [0, 0, 0, 0, 0],
}
data2 = {
'r_id': ['4968', '4968', '4968', '4968', '4968', '4968', '3245', '3245'],
'date': ['02/08/2020', '02/08/2020', '02/08/2020', '31/10/2020', '31/10/2020', '31/10/2020', '20/12/2020', '20/12/2020'],
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df2_copy = df2.copy()
df2 = [df2.r_id.isin(set(df1.r_id))]
# Check the difference between the new df2 and the original one.
print(df2_copy)
print(df2)