I have a list such as
[ apple, orange, banana ]
that I want to match with lists in a dataframe column and create a new column with the matched list of elements
fruits | match |
---|---|
[apple, banana, berry ] | [apple, banana] |
[orange] | [orange] |
How can I accomplish this in an efficient way? Thanks.
CodePudding user response:
In order to have a working example, I considered that your data were lists of strings :
df = pd.DataFrame({
'fruits':[['apple','orange','berry'],['orange']]
})
ml = ['apple', 'orange', 'banana']
Then I created a function that return the list of matching elements, or 0 if there's no match :
def matchFruits(row):
result = []
for fruit in row['fruits']:
if fruit in ml :
result.append(fruit)
return result if len(result) > 0 else 0
result = [fruit for fruit in row['fruits'] if fruit in ml]
for list comprehesion aficionados.
Finally, I called this function on the whole DataFrame with axis = 1
to add a new column to the initial DataFrame :
df["match"] = df.apply(matchFruits, axis = 1)
The output is the following, it is different from your example since your result was 0 even though 'orange' was in both list. If it is not the requested behavior, please edit your question.
fruits match
0 [apple, orange, berry] [apple, orange]
1 [orange] [orange]
CodePudding user response:
You can try apply
a filter
lst = ['apple', 'orange', 'banana']
df['match'] = df['fruits'].apply(lambda ls: list(filter(lambda x: x in lst, ls)))
print(df)
fruits match
0 [apple, orange, berry] [apple, orange]
1 [orange] [orange]
``