I have two dataframes, df1 and df2:
Transport | City | Color |
---|---|---|
Car | Paris | red |
Car | London | white |
Bike | Paris | red |
Car | New York | blue |
Color |
---|
red |
blue |
blue |
They are not the same length.
I want to make a new dataframe based on the first one, where I only keep the row, if the color is also present in the second dataframe, such that the output would be:
Transport | City | Color |
---|---|---|
Car | Paris | red |
Bike | Paris | red |
Car | New York | Blue |
Is there a way to do that? I want to write something like:
df1[df1.Color.isin(df2.Color)]
But it does not seem to work.
edit: I think the issue is that the data type in the first dataframe is str and not in the second.
CodePudding user response:
You can try with a merge after dropping duplicates to avoid cartesian product:
df_1.merge(df_2.drop_duplicates(),left_on=['City','Color'],right_on=['City','Color'])
Outputting:
Transport City Color
0 Car Paris red
1 Bike Paris red
2 Car New York blue
CodePudding user response:
If both should match you need to merge and then filter out the nulls:
df1 = df1.merge(right=df2.drop_duplicates(), on=['City','Color'], how='left')
df1.dropna(subset=['Transport'], inplace = True)
Let me know if this works for you