Python flag if word is in a dataframe dictionary-CodePudding

I currently have dataframe which contains a column holding a dictionary.

data={'id':['1','2','3','4'], 'results':['''[{'env':, 'global', 'name':, 'example1', 'label':, 'find'}, {'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''', 
                             '''[{'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''',
                             '''[{'env': 'global', 'name': 'example2', 'label': 'test'}, {'env': 'global', 'name': 'example3', 'label': 'test'}]''',
                             '''[{'env': 'global', 'name': 'test1', 'label': 'find'}]'''], 'start':[0,0,0,0], 'limit':[100,100,100,100], 'size':[2,1,2,1]}

Is it possible to generate flags from the 'results' IF a word is present in that dictionary?

For example a flag_find if the word 'find' is present; flag_test if the word 'test' is present; and flag_example if the word 'example' is present

Desired output; three columns in bold

CodePudding user response：

I have provided a function based off the data you gave me. just enter the dat followed by the flag.

Here it is:

data={'id':['1','2','3','4'], 'results':['''[{'env':, 'global', 'name':, 'example1', 'label':, 'find'}, {'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''', 
                             '''[{'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''',
                             '''[{'env': 'global', 'name': 'example2', 'label': 'test'}, {'env': 'global', 'name': 'example3', 'label': 'test'}]''',
                             '''[{'env': 'global', 'name': 'test1', 'label': 'find'}]'''], 'start':[0,0,0,0], 'limit':[100,100,100,100], 'size':[2,1,2,1]}
def find_flags(data,flag):
  li=[]
  for x in data["id"]:
    a=data["results"][int(x)-1]
    b=a.split("\'")
    for y in range(1,len(b)-1,2):
      if flag in b[y]:
        if x not in li:
          li.append(x)
  return(li)

print(find_flags(data,"test"))
# returns ['1','2','3','4']
print(find_flags(data,"find"))
# returns ['1','4']
print(find_flags(data,"example"))
# returns ['1','2','3']

CodePudding user response：

Assuming the "results" column is indeed of string type:

df2 = (df['results'].str.extractall("(?P<flag_find>:,? 'find')|(?P<flag_test>:,? 'test')|(?P<flag_example>:,? 'example\d ')")
         .notna().groupby(level=0).max()
         .replace({True: 'Y', False: ''})
       )

df.join(df2)

output:

  id results  start  limit  size flag_find flag_test flag_example
0  1   [...]      0    100     2         Y         Y            Y
1  2   [...]      0    100     1                   Y            Y
2  3   [...]      0    100     2                   Y            Y
3  4   [...]      0    100     1         Y

CodePudding user response：

We can use loc and contains to get the expected result :

>>> df.loc[df["results"].astype(str).str.contains('find'), 'flag_find'] = 'Y' 
>>> df.loc[df["results"].astype(str).str.contains('test'), 'flag_test'] = 'Y' 
>>> df.loc[df["results"].astype(str).str.contains('example'), 'flag_example']='Y' 

>>> df
   id results            start limit size  flag_find flag_test flag_example
0  1  [{'env':, 'glob... 0     100   2     Y         Y         Y
1  2  [{'env':, 'glob... 0     100   1     NaN       Y         Y
2  3  [{'env': 'globa... 0     100   2     NaN       Y         Y
3  4  [{'env': 'globa... 0     100   1     Y         Y         NaN