Compare each element in a list with a column of lists in a dataframe python-CodePudding

I have a list such as

[ apple, orange, banana ]

that I want to match with lists in a dataframe column and create a new column with the matched list of elements

fruits	match
[apple, banana, berry ]	[apple, banana]
[orange]	[orange]

How can I accomplish this in an efficient way? Thanks.

CodePudding user response：

In order to have a working example, I considered that your data were lists of strings :

df = pd.DataFrame({
    'fruits':[['apple','orange','berry'],['orange']]
})

ml = ['apple', 'orange', 'banana']

Then I created a function that return the list of matching elements, or 0 if there's no match :

def matchFruits(row):
    result = []
    for fruit in row['fruits']:
        if fruit in ml :
            result.append(fruit)

    return result if len(result) > 0 else 0

result = [fruit for fruit in row['fruits'] if fruit in ml] for list comprehesion aficionados.

Finally, I called this function on the whole DataFrame with axis = 1 to add a new column to the initial DataFrame :

df["match"] = df.apply(matchFruits, axis = 1)

The output is the following, it is different from your example since your result was 0 even though 'orange' was in both list. If it is not the requested behavior, please edit your question.

                   fruits            match
0  [apple, orange, berry]  [apple, orange]
1                [orange]         [orange]

CodePudding user response：

You can try apply a filter

lst = ['apple', 'orange', 'banana']

df['match'] = df['fruits'].apply(lambda ls: list(filter(lambda x: x in lst, ls)))

print(df)

                   fruits            match
0  [apple, orange, berry]  [apple, orange]
1                [orange]         [orange]
``