Home > Software engineering >  How to delete rows from df1 where the column2 value doesn't appear in df2col1
How to delete rows from df1 where the column2 value doesn't appear in df2col1

Time:09-15

df1 for example is

col1 col2 col3
abcdef ghijkl mnopqr
abcdef1 ghijkl1 mnopqr1

df2 is

col1
ghijkl1

essentially I want to delete rows from df1 where the column2 value doesn't appear in df2col1

the final df1 would be:

col1 col2 col3
abcdef1 ghijkl1 mnopqr1

CodePudding user response:

Make sure you don't have whitespace at the start or end of tha values inside your DataFrame or column names, as the guys pointed out above. i will try to show how to do it in this answer.

# striping whitespace from col names for both DataFrames

df1.rename(columns=lambda x: x.strip(), inplace = True)
df2.rename(columns=lambda x: x.strip(), inplace = True)

# striping whitespace from values in both DataFrames

df1 = df1.apply(lambda x: x.str.strip())
df2 = df2.apply(lambda x: x.str.strip())

# droping rows from df1 where the col2 value doesn't appear in df2 col1

mask = df1["col2"].isin(df2["col1"]) # returns a series of boolean mask(True or False, for the condition we are looking for)

new_df = df1[mask] # Returns what you are asking for :D

Hope this helps.

CodePudding user response:

indexes = df1[!df1.col2.isin(df2.col2)].index
df1 = df1.drop(indexes)
  • Related