Home > Mobile >  Pandas find exact words from a list and assign Boolean value if found
Pandas find exact words from a list and assign Boolean value if found

Time:07-06

So, I have dataframe like this,

data = {
  "properties": ["FinancialOffice","Gas Station", "Office", "K-12 School"],
}
df = pd.DataFrame(data)

This is my list,

proplist = ["Office","Other - Mall","Gym"]

what I am trying to do is using the list I am trying to find out which words exactly matches with the dataframe column and for each word from the dataframe I need to assign a Boolean true/false value or 0/1. It has to be a exact match.

Output like this,

properties         flag
FinancialOffice    FALSE
Gas Station        FALSE
Office             TRUE
K-12 School        FALSE

So, It returns TRUE for only "Office" because it is the exact match from the list. FinancialOffice is not because it is not in the list.

This was my approach, it works fine but I need to assign a new Boolean column to df to find out which ones are exact match.

My approach,

import re 
s= ','.join(df["properties"]) # gives comma separated values. 

for words in proplist  :
    if re.search(r'\b'   words   r'\b', s):
        print('{0}'.format(words)) ## print out only Office the matching word.

Any help is appreciated. It needs to be regex as str.contains cant find exact match.

CodePudding user response:

Try Series.isin:

df["flag"] = df["properties"].isin(proplist)
print(df)

Prints:

        properties   flag
0  FinancialOffice  False
1      Gas Station  False
2           Office   True
3      K-12 School  False

CodePudding user response:

Try converting prop_list into a set then using pandas.Series.isin:

In [1]: import pandas as pd
   ...: 
   ...: d = {"properties": ["FinancialOffice", "Gas Station", "Office", "K-12 School"]}
   ...: df = pd.DataFrame(data=d)
   ...: df
Out[1]: 
        properties
0  FinancialOffice
1      Gas Station
2           Office
3      K-12 School
In [2]: prop_list = ["Office", "Other - Mall", "Gym"]
In [3]: df["flag"] = df["properties"].isin(set(prop_list))
   ...: df
Out[3]: 
        properties   flag
0  FinancialOffice  False
1      Gas Station  False
2           Office   True
3      K-12 School  False

CodePudding user response:

You can use map with lambda:

df['flag'] = df['properties'].map(lambda x: x in proplist)

or for better performance use set before looking

df['flag'] = df['properties'].map(lambda x: x in set(proplist))
  • Related