How could I create exactly_in_colb? It checks if any item in lookfor is exactly in colb
lookfor = ["ap", "bana", "pear", "taste"]
df col colb
0 apple ["app", "pl"]
1 banana ["bana"]
2 pear ["pear"]
Expected Output
df col colb exactly_in_colb
0 apple ["app", "pl"]
1 banana ["bana", "yellow"] ["bana"]
2 pear ["pear", "taste"] ["pear", "taste"]
Pandas thinks colb is an np.array. If I convert it to string, the code runs but won't look for exact matches.
CodePudding user response:
Use set.intersection
:
lookfor = ["ap", "bana", "pear", "taste"]
df['exactly_in_colb'] = df['colb'].apply(lambda x: list(set(x).intersection(lookfor)))
print (df)
df col colb exactly_in_colb
0 0 apple [app, pl] []
1 1 banana [bana] [bana]
2 2 pear [pear, taste] [pear, taste]
CodePudding user response:
You need a list comprehension and set lookup (for efficiency):
lookfor = ["ap", "bana", "pear", "taste"]
# make a set for efficient lookup
S = set(lookfor)
# looping through list to keep original order
df['exactly_in_colb'] = [[v for v in l if v in S] for l in df['colb']]
output:
col colb exactly_in_colb
0 apple [app, pl] []
1 banana [bana] [bana]
2 pear [pear, taste] [pear, taste]