df1 for example is
col1 | col2 | col3 |
---|---|---|
abcdef | ghijkl | mnopqr |
abcdef1 | ghijkl1 | mnopqr1 |
df2 is
col1 |
---|
ghijkl1 |
essentially I want to delete rows from df1 where the column2 value doesn't appear in df2col1
the final df1 would be:
col1 | col2 | col3 |
---|---|---|
abcdef1 | ghijkl1 | mnopqr1 |
CodePudding user response:
Make sure you don't have whitespace at the start or end of tha values inside your DataFrame or column names, as the guys pointed out above. i will try to show how to do it in this answer.
# striping whitespace from col names for both DataFrames
df1.rename(columns=lambda x: x.strip(), inplace = True)
df2.rename(columns=lambda x: x.strip(), inplace = True)
# striping whitespace from values in both DataFrames
df1 = df1.apply(lambda x: x.str.strip())
df2 = df2.apply(lambda x: x.str.strip())
# droping rows from df1 where the col2 value doesn't appear in df2 col1
mask = df1["col2"].isin(df2["col1"]) # returns a series of boolean mask(True or False, for the condition we are looking for)
new_df = df1[mask] # Returns what you are asking for :D
Hope this helps.
CodePudding user response:
indexes = df1[!df1.col2.isin(df2.col2)].index
df1 = df1.drop(indexes)