I am using python 3.8 in a Jupyter Notebook
What is the best way to see if a string in our dataframe column contains a substring within a list and then REPLACE our dataframe column value with the substring it contained? Note: there will only be one match. Mutliple substrings will not match on our comparison.
Ex: '/google123' contains a substring 'google', so the new value in our df would be 'google' instead of '/google123'.
Dummy Data Below:
import pandas as pd
channel = ['google','bing','facebook','google','facebook']
page = ['/cars?exp=123', '/google123', '/toytrucks', '/cars','/randompage']
compare_list = ['google','cars?exp']
df = pd.DataFrame(zip(channel,page), columns=['channel','page'])
My Attempt Was:
for i in range(0,len(df)):
cur_page = df.loc[i, "page"]
for x in compare_list:
if x in cur_page:
cur_page = x
else:
pass
CodePudding user response:
You can use apply to pass the values to the function:
def check(x):
for i in compare_list:
if i in x:
return i
return x
df['page'] = df['page'].apply(check)
Outputs:
channel | page | |
---|---|---|
0 | cars?exp | |
1 | bing | |
2 | /toytrucks | |
3 | /cars | |
4 | /randompage |
PS. in case you just wish to extract the words, you can also use regex:
df['page'].str.extract('([^\W\d_] )')