I currently have dataframe which contains a column holding a dictionary.
data={'id':['1','2','3','4'], 'results':['''[{'env':, 'global', 'name':, 'example1', 'label':, 'find'}, {'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''',
'''[{'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''',
'''[{'env': 'global', 'name': 'example2', 'label': 'test'}, {'env': 'global', 'name': 'example3', 'label': 'test'}]''',
'''[{'env': 'global', 'name': 'test1', 'label': 'find'}]'''], 'start':[0,0,0,0], 'limit':[100,100,100,100], 'size':[2,1,2,1]}
Is it possible to generate flags from the 'results' IF a word is present in that dictionary?
For example a flag_find if the word 'find' is present; flag_test if the word 'test' is present; and flag_example if the word 'example' is present
Desired output; three columns in bold
CodePudding user response:
I have provided a function based off the data you gave me. just enter the dat followed by the flag.
Here it is:
data={'id':['1','2','3','4'], 'results':['''[{'env':, 'global', 'name':, 'example1', 'label':, 'find'}, {'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''',
'''[{'env':, 'global', 'name':, 'example2', 'label':, 'test'}]''',
'''[{'env': 'global', 'name': 'example2', 'label': 'test'}, {'env': 'global', 'name': 'example3', 'label': 'test'}]''',
'''[{'env': 'global', 'name': 'test1', 'label': 'find'}]'''], 'start':[0,0,0,0], 'limit':[100,100,100,100], 'size':[2,1,2,1]}
def find_flags(data,flag):
li=[]
for x in data["id"]:
a=data["results"][int(x)-1]
b=a.split("\'")
for y in range(1,len(b)-1,2):
if flag in b[y]:
if x not in li:
li.append(x)
return(li)
print(find_flags(data,"test"))
# returns ['1','2','3','4']
print(find_flags(data,"find"))
# returns ['1','4']
print(find_flags(data,"example"))
# returns ['1','2','3']
CodePudding user response:
Assuming the "results" column is indeed of string type:
df2 = (df['results'].str.extractall("(?P<flag_find>:,? 'find')|(?P<flag_test>:,? 'test')|(?P<flag_example>:,? 'example\d ')")
.notna().groupby(level=0).max()
.replace({True: 'Y', False: ''})
)
df.join(df2)
output:
id results start limit size flag_find flag_test flag_example
0 1 [...] 0 100 2 Y Y Y
1 2 [...] 0 100 1 Y Y
2 3 [...] 0 100 2 Y Y
3 4 [...] 0 100 1 Y
CodePudding user response:
We can use loc
and contains
to get the expected result :
>>> df.loc[df["results"].astype(str).str.contains('find'), 'flag_find'] = 'Y'
>>> df.loc[df["results"].astype(str).str.contains('test'), 'flag_test'] = 'Y'
>>> df.loc[df["results"].astype(str).str.contains('example'), 'flag_example']='Y'
>>> df
id results start limit size flag_find flag_test flag_example
0 1 [{'env':, 'glob... 0 100 2 Y Y Y
1 2 [{'env':, 'glob... 0 100 1 NaN Y Y
2 3 [{'env': 'globa... 0 100 2 NaN Y Y
3 4 [{'env': 'globa... 0 100 1 Y Y NaN