Home > Software design >  Keep rows in dataframe where values are present in another dataframe
Keep rows in dataframe where values are present in another dataframe

Time:08-09

I have two dataframes, df1 and df2:

Transport City Color
Car Paris red
Car London white
Bike Paris red
Car New York blue
Color
red
blue
blue

They are not the same length.

I want to make a new dataframe based on the first one, where I only keep the row, if the color is also present in the second dataframe, such that the output would be:

Transport City Color
Car Paris red
Bike Paris red
Car New York Blue

Is there a way to do that? I want to write something like:

df1[df1.Color.isin(df2.Color)]

But it does not seem to work.

edit: I think the issue is that the data type in the first dataframe is str and not in the second.

CodePudding user response:

You can try with a merge after dropping duplicates to avoid cartesian product:

df_1.merge(df_2.drop_duplicates(),left_on=['City','Color'],right_on=['City','Color'])

Outputting:

  Transport      City Color
0       Car     Paris   red
1      Bike     Paris   red
2       Car  New York  blue

CodePudding user response:

If both should match you need to merge and then filter out the nulls:

df1 = df1.merge(right=df2.drop_duplicates(), on=['City','Color'], how='left')
df1.dropna(subset=['Transport'], inplace = True)

Let me know if this works for you

  • Related