Home > Software design >  Check If a List of Substrings Is Within Pandas DF Column AND Replace DF Value With Substring If True
Check If a List of Substrings Is Within Pandas DF Column AND Replace DF Value With Substring If True

Time:10-20

I am using python 3.8 in a Jupyter Notebook

What is the best way to see if a string in our dataframe column contains a substring within a list and then REPLACE our dataframe column value with the substring it contained? Note: there will only be one match. Mutliple substrings will not match on our comparison.

Ex: '/google123' contains a substring 'google', so the new value in our df would be 'google' instead of '/google123'.

Dummy Data Below:

import pandas as pd

channel = ['google','bing','facebook','google','facebook']
page = ['/cars?exp=123', '/google123', '/toytrucks', '/cars','/randompage']

compare_list = ['google','cars?exp']

df = pd.DataFrame(zip(channel,page), columns=['channel','page'])

My Attempt Was:

for i in range(0,len(df)):
        
    cur_page = df.loc[i, "page"]
        
    for x in compare_list:
     
        if x in cur_page:
            cur_page = x
            
        else:
            pass

CodePudding user response:

You can use apply to pass the values to the function:

def check(x):
    for i in compare_list:
        if i in x:
            return i
    return x

df['page'] = df['page'].apply(check)

Outputs:

channel page
0 google cars?exp
1 bing google
2 facebook /toytrucks
3 google /cars
4 facebook /randompage

PS. in case you just wish to extract the words, you can also use regex:

df['page'].str.extract('([^\W\d_] )')
  • Related