How to delete rows from df1 where the column2 value doesn't appear in df2col1-CodePudding

df1 for example is

col1	col2	col3
abcdef	ghijkl	mnopqr
abcdef1	ghijkl1	mnopqr1

df2 is

col1
ghijkl1

essentially I want to delete rows from df1 where the column2 value doesn't appear in df2col1

the final df1 would be:

col1	col2	col3
abcdef1	ghijkl1	mnopqr1

CodePudding user response：

Make sure you don't have whitespace at the start or end of tha values inside your DataFrame or column names, as the guys pointed out above. i will try to show how to do it in this answer.

# striping whitespace from col names for both DataFrames

df1.rename(columns=lambda x: x.strip(), inplace = True)
df2.rename(columns=lambda x: x.strip(), inplace = True)

# striping whitespace from values in both DataFrames

df1 = df1.apply(lambda x: x.str.strip())
df2 = df2.apply(lambda x: x.str.strip())

# droping rows from df1 where the col2 value doesn't appear in df2 col1

mask = df1["col2"].isin(df2["col1"]) # returns a series of boolean mask(True or False, for the condition we are looking for)

new_df = df1[mask] # Returns what you are asking for :D

Hope this helps.

CodePudding user response：

indexes = df1[!df1.col2.isin(df2.col2)].index
df1 = df1.drop(indexes)