Matching an element from a list to a column that holds lists. If single element found, return entire-CodePudding

If there is a column that holds lists and if a single element matches from our list, Return entire row. For example we have a data frame:

index             x
0                [apple, orange, strawberry]
1                [blueberry, pear, watermelon]
2                [apple, banana, strawberry]
3                [apple]
4                [strawberry]

And we have our list,
a = [apple, strawberry]
# I am trying to return index 0,2,3 and 4. But currently I am only able to return index 3 and 4
new_DF = df[df['x'].isin(a)]

# This function is getting the user input for list 'a'. 
# This is for reference of what I am actually trying to do. 

def filter_Industries():
    num_of_industries = int(input('How many industries would you like to filter by?\n'))
    list_industries = []  
    i = 0
    for i in range(num_of_industries):
        industry = input("Enter the industry:\n")
        i  = 1
        list_industries.append(industry)

    return list_industries

a = filter_Industries()
# This is where I am trying to match the elements from the user's list to the data set.
new_DF = df[df['x'].isin(a)]

CodePudding user response：

You can use DataFrame.apply(function) method. In this case we check all rows whether have a common with "a" list.Let's create function :

a = ["apple", "strawberry"]
a_set = set(a)
def hasCommon(x):
    return len(set(x) & a_set) > 0

So if we have a common element it will return True. Let's create dummy data

import pandas as pd
data = {
  "calories": [["apple", "orange", "strawberry"], ["blueberry", "pear", "watermelon"], ["strawberry", "pear", "watermelon"]],
  "duration": [50, 40,120]
}

#load data into a DataFrame object:
df = pd.DataFrame(data)

print(df)

And we can use like that:

df[df["calories"].apply(hasCommon)]

CodePudding user response：

When you using isin(a) on the values of the 0, 1 and 2 index, the function try to compare a list (e.g., [apple, orange, strawberry]) with the a list. The function worked with the 3 and 4 elements because it compares a single element with a whole list.

I suggest to intersect the a list and the dataframe after converted that two a set, with this code:

for i in range(len(df)):
 if set(a) & set(df['x'][i]) != set():
  new_DF.append(df['x'][i])

It will append to new_DF just the lines that isn't returned void sets.