A column in my DataFrame
is labeled as Occupation
. In that column, Real Estate
is represented in several different ways. These are the three ways it's represented:
RealEstate
REALESTATE
RealEstateDeveloper
Other occupations I don't want
I want to pull every iteration and variation of Real Estate
and put it into it's own DataFrame
. This is what I have:
dfRealEstate = df[(df.Occupation == 'RealEstate') | (df.Occupation == 'REALESTATE') | (df.Occupation == 'RealEstateDeveloper')]
I get a blank dataframe
. My output should look like this:
col1
RealEstate
RealEstate
REALESTATE
REALESTATE
REALESTATE
REALESTATE
RealEstateDeveloper
RealEstateDeveloper
RealEstateDeveloper
CodePudding user response:
Try to clean your rows before:
df['Occupation'].str.strip().str.casefold().str.contains('realestate')
CodePudding user response:
Try to create a mask from a list of variations:
mask_realEstate = df.loc[:,"Occupation"].isin(['RealEstate','REALESTATE','RealEstateDeveloper'])
Now, use to mask with .loc
to create a new DataFrame:
dfRealEstate = df.loc[mask_realEstate,"Occupation"]