I want to check if certains value present in one column and check in another column using python if we get data then mark as yes
We have below dataframe
import pandas as pd
import numpy as np
a1=["Highschool.sg","school","school.sggs","school.coep","school.mit","address","address.pune","address.Nanded","address.mumbai"]
a2=[34,56,55,34,23,60,34,56,100]
a3=[np.nan,str(["sggs","coep","mit"]),np.nan,np.nan,np.nan,str(["pune","Nanded"]),np.nan,np.nan,np.nan]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','id','required'])
Here if we get any value like ['sggs','coep','mit'] then need to check in column if any keyword match with this value then mark as YES
Expected output
CodePudding user response:
You can extract
the words (or use any method to get individual strings), concatenate to 'data', and use the list of words to find the matches for boolean indexing:
target = (df['data'] '.'
df['required'].str.extractall('(\w )')[0].droplevel(1)
).dropna()
# ['school.sggs', 'school.coep', 'school.mit', 'address.pune', 'address.Nanded']
df.loc[df['data'].isin(target), 'required'] = 'Yes'
output:
data id required
0 Highschool.sg 34 NaN
1 school 56 ['sggs', 'coep', 'mit']
2 school.sggs 55 Yes
3 school.coep 34 Yes
4 school.mit 23 Yes
5 address 60 ['pune', 'Nanded']
6 address.pune 34 Yes
7 address.Nanded 56 Yes
8 address.mumbai 100 NaN
CodePudding user response:
You can use ast.literal_eval
to convert your string lists to actual lists and operate str.contains
on the list of exploded values of df.required
:
from ast import literal_eval
required = df.required.dropna().apply(literal_eval).explode()
df.loc[df.data.str.contains('|'.join(required)).values, 'required'] = 'Yes'