Home > Enterprise >  Check value present in dataframe
Check value present in dataframe

Time:09-22

I want to check if certains value present in one column and check in another column using python if we get data then mark as yes

We have below dataframe

import pandas as pd
import numpy as np
a1=["Highschool.sg","school","school.sggs","school.coep","school.mit","address","address.pune","address.Nanded","address.mumbai"]
a2=[34,56,55,34,23,60,34,56,100]
a3=[np.nan,str(["sggs","coep","mit"]),np.nan,np.nan,np.nan,str(["pune","Nanded"]),np.nan,np.nan,np.nan]
df =pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','id','required'])

enter image description here

Here if we get any value like ['sggs','coep','mit'] then need to check in column if any keyword match with this value then mark as YES

Expected output

enter image description here

CodePudding user response:

You can extract the words (or use any method to get individual strings), concatenate to 'data', and use the list of words to find the matches for boolean indexing:

target = (df['data'] '.'
           df['required'].str.extractall('(\w )')[0].droplevel(1)
         ).dropna()
# ['school.sggs', 'school.coep', 'school.mit', 'address.pune', 'address.Nanded']

df.loc[df['data'].isin(target), 'required'] = 'Yes'

output:

             data   id                 required
0   Highschool.sg   34                      NaN
1          school   56  ['sggs', 'coep', 'mit']
2     school.sggs   55                      Yes
3     school.coep   34                      Yes
4      school.mit   23                      Yes
5         address   60       ['pune', 'Nanded']
6    address.pune   34                      Yes
7  address.Nanded   56                      Yes
8  address.mumbai  100                      NaN

CodePudding user response:

You can use ast.literal_eval to convert your string lists to actual lists and operate str.contains on the list of exploded values of df.required:

from ast import literal_eval
required = df.required.dropna().apply(literal_eval).explode()
df.loc[df.data.str.contains('|'.join(required)).values, 'required'] = 'Yes'
  • Related