I have two dataframes, one containing lot of columns
df1:
id age topic date text
1 23 Student 1.1. Lorem
2 19 Student 1.2. Cupcake
20 19 Student 1.2. Lorem Ipsum
190 21 Student 11.1. Cupcake Ipsum
And one with two columns
df2:
id count
1 105
20 4843
31 361
What I'm trying to accomplish here is to drop all the rows from df1 that are not mentioned in df2 (their id is not there).
result_df:
id age topic date text
1 23 Student 1.1. Lorem
20 19 Student 1.2. Lorem Ipsum
I've tried this but it's not working:
result_df = df1.drop(df1[df1['id'] != df2['id']].index)
Could you please help me out?
CodePudding user response:
result_df = df1.merge(df2['id'])
Given:
df1:
id age topic
0 1 23 Student
1 2 19 Student
2 20 19 Student
3 190 21 Student
df2:
id count
0 1 105
1 20 4843
2 31 361
Doing:
result_df = df1.merge(df2['id'])
print(result_df)
Output:
id age topic
0 1 23 Student
1 20 19 Student
CodePudding user response:
You can do:
df1 = df1.set_index('id')
df2 = df2.set_index('id')
result_df = df1.loc[df1.index.intersection(df2.index)].reset_index()
Output: result_df
id age topic date text
0 1 23 Student 1.1. Lorem
1 20 19 Student 1.2. Lorem Ipsum