I have a DataFrame with a column called antecedent
(list values) and I would like to return True or False for rows where all elements are present in another list called itemset
.
Example itemset
:
['Investimento Fundos_commodities', 'Investimento Fundos Multimercado','Emprestimo _educacao', 'Investimento CDB']
Example antecedent
:
['Investimento Fundos_commodities', 'Investimento Fundos']
# Desired Output : False, because we don't have the presence of the Investment Funds value in itemset.
Other Example antecedent
:
['Investimento Fundos_commodities', 'Investimento CDB']
# Desired Output : True, because we have the presence of the two values in itemset.
I was only able to print the elements that are present in itemset
, but I was not able to do this check for elements that are not present to return True or False.
df_itemset['antecedent'].map(lambda antecedents : [x for x in antecedents if x in itemset])
# Output :
# 16 [Investimento Fundos_commodities]
# 23 [Investimento Fundos_commodities]
# 4 [Investimento Fundos_commodities]
# 26 [Investimento Fundos_commodities]
# 30 [Investimento Fundos_commodities]
...
# 138 [Investimento CDB]
# 139 [Investimento CDB]
# 140 [Investimento CDB]
# 141 [Investimento CDB]
# 142 [Investimento CDB]
# Name: antecedent, Length: 99, dtype: object
CodePudding user response:
Use set
and issubset
predicate:
data = {'antecedent': [['Investimento Fundos_commodities', 'Investimento Fundos'],
['Investimento Fundos_commodities', 'Investimento CDB']]}
df = pd.DataFrame(data)
df['issubset'] = df['antecedent'].apply(lambda x: set(x).issubset(itemset))
print(out)
# Output:
antecedent issubset
0 [Investimento Fundos_commodities, Investimento Fundos] False
1 [Investimento Fundos_commodities, Investimento CDB] True
CodePudding user response:
You could try something like
df_itemset['antecedent'].map(
lambda antecedents : len([x for x in antecedents if x not in itemset])
) == 0
It calculates the number of antecedents which are not in itemset, so it is > 0 if and only if your condition does not hold.