Keep rows in dataframe where values are present in another dataframe-CodePudding

I have two dataframes, df1 and df2:

Transport	City	Color
Car	Paris	red
Car	London	white
Bike	Paris	red
Car	New York	blue

Color
red
blue
blue

They are not the same length.

I want to make a new dataframe based on the first one, where I only keep the row, if the color is also present in the second dataframe, such that the output would be:

Transport	City	Color
Car	Paris	red
Bike	Paris	red
Car	New York	Blue

Is there a way to do that? I want to write something like:

df1[df1.Color.isin(df2.Color)]

But it does not seem to work.

edit: I think the issue is that the data type in the first dataframe is str and not in the second.

CodePudding user response：

You can try with a merge after dropping duplicates to avoid cartesian product:

df_1.merge(df_2.drop_duplicates(),left_on=['City','Color'],right_on=['City','Color'])

Outputting:

  Transport      City Color
0       Car     Paris   red
1      Bike     Paris   red
2       Car  New York  blue

CodePudding user response：

If both should match you need to merge and then filter out the nulls:

df1 = df1.merge(right=df2.drop_duplicates(), on=['City','Color'], how='left')
df1.dropna(subset=['Transport'], inplace = True)

Let me know if this works for you