Check whether the elements of a column with list values are present in another list-CodePudding

I have a DataFrame with a column called antecedent (list values) and I would like to return True or False for rows where all elements are present in another list called itemset.

Example itemset :

['Investimento Fundos_commodities', 'Investimento Fundos Multimercado','Emprestimo _educacao', 'Investimento CDB']

Example antecedent:

['Investimento Fundos_commodities', 'Investimento Fundos']
# Desired Output : False, because we don't have the presence of the Investment Funds value in itemset.

Other Example antecedent :

['Investimento Fundos_commodities', 'Investimento CDB']
# Desired Output : True, because we have the presence of the two values in itemset.

I was only able to print the elements that are present in itemset, but I was not able to do this check for elements that are not present to return True or False.

df_itemset['antecedent'].map(lambda antecedents : [x for x in antecedents if x in itemset])

# Output :
# 16     [Investimento Fundos_commodities]
# 23     [Investimento Fundos_commodities]
# 4     [Investimento Fundos_commodities]
# 26     [Investimento Fundos_commodities]
# 30     [Investimento Fundos_commodities]
                     ...                
# 138                   [Investimento CDB]
# 139                   [Investimento CDB]
# 140                   [Investimento CDB]
# 141                   [Investimento CDB]
# 142                   [Investimento CDB]
# Name: antecedent, Length: 99, dtype: object

CodePudding user response：

Use set and issubset predicate:

data = {'antecedent': [['Investimento Fundos_commodities', 'Investimento Fundos'], 
                       ['Investimento Fundos_commodities', 'Investimento CDB']]}
df = pd.DataFrame(data)

df['issubset'] = df['antecedent'].apply(lambda x: set(x).issubset(itemset))
print(out)

# Output:
                                               antecedent  issubset
0  [Investimento Fundos_commodities, Investimento Fundos]     False
1     [Investimento Fundos_commodities, Investimento CDB]      True

CodePudding user response：

You could try something like

df_itemset['antecedent'].map(
  lambda antecedents : len([x for x in antecedents if x not in itemset])
) == 0

It calculates the number of antecedents which are not in itemset, so it is > 0 if and only if your condition does not hold.