Consider that I have one dataframe that looks like this:
Gender Employed
1 1
0 0
1 0
0 1
And I have another dataframe that looks like:
Name Area
Andy Gender
Ally HR
Chris Employed
Tom Food
I only want to keep the row entries in the area column that correspond to the column names of my first dataframe. These are example dataframes and my actual dataframe has hundreds of columns so no very specific answers involving 'Gender' and 'Employed' will work.
The end result should be
Name Area
Andy Gender
Chris Employed
CodePudding user response:
You can filter a df like this:
df[df['Area'].isin(df2.columns)]
CodePudding user response:
See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics for DataFrame slicing basics.
You want to slice the 2nd dataframe based on the columns found in the first dataframe.
df1 = pd.DataFrame(
[(1, 1), (0, 0), (1, 0), (0, 1)],
columns=["Gender", "Employed"]
)
# Gender Employed
# 0 1 1
# 1 0 0
# 2 1 0
# 3 0 1
df2 = pd.DataFrame(
[("Andy", "Gender"), ("Ally", "HR"),
("Chris", "Employed"), ("Tom", "Food")],
columns=["Name", "Area"]
)
# Name Area
# 0 Andy Gender
# 1 Ally HR
# 2 Chris Employed
# 3 Tom Food
df2[df2.Area.isin(df1.columns)]
# Name Area
# 0 Andy Gender
# 2 Chris Employed