Python: filter dataframe rows so that values are in columns of other dataframe-CodePudding

Consider that I have one dataframe that looks like this:

Gender Employed
1      1
0      0 
1      0
0      1

And I have another dataframe that looks like:

Name  Area
Andy  Gender
Ally  HR
Chris Employed
Tom   Food

I only want to keep the row entries in the area column that correspond to the column names of my first dataframe. These are example dataframes and my actual dataframe has hundreds of columns so no very specific answers involving 'Gender' and 'Employed' will work.

The end result should be

Name  Area
Andy  Gender
Chris Employed

CodePudding user response：

You can filter a df like this:

df[df['Area'].isin(df2.columns)]

CodePudding user response：

See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#basics for DataFrame slicing basics.

You want to slice the 2nd dataframe based on the columns found in the first dataframe.

df1 = pd.DataFrame(
    [(1, 1), (0, 0), (1, 0), (0, 1)],
    columns=["Gender", "Employed"]
)

#    Gender  Employed
# 0       1         1
# 1       0         0
# 2       1         0
# 3       0         1

df2 = pd.DataFrame(
    [("Andy", "Gender"), ("Ally", "HR"),
     ("Chris", "Employed"), ("Tom", "Food")],
    columns=["Name", "Area"]
)

#     Name      Area
# 0   Andy    Gender
# 1   Ally        HR
# 2  Chris  Employed
# 3    Tom      Food


df2[df2.Area.isin(df1.columns)]

#     Name      Area
# 0   Andy    Gender
# 2  Chris  Employed