I'm working with a dataframe with two columns with lists of strings, and I need to know if all elements of a list are contained in the other list.
Initially my values were strings, here's an example:
df1
num
0 [10 2]
1 [120]
2 [2 5 8]
df2
num
0 [10 2]
1 [60]
2 [2 5]
Then I used df1['num'].str.split()
to get the elements in the string into a list:
df1
num
0 [10, 2]
1 [120]
2 [2, 5, 8]
After that I tried using all(item in df1['num'].str.split() for item in df2['num'].str.split())
but it outputs:
TypeError: unhashable type: 'list'
The desirable output would be:
0 True
1 False
2 True
How can I do this?
CodePudding user response:
import pandas as pd
df1 = pd.DataFrame({
'num': [['10', '2'], ['120'], ['2', '5', '8']]
})
df2 = pd.DataFrame({
'num': [['10', '2'], ['60'], ['2', '5']]
})
df1_str = pd.DataFrame(df1['num'].str)
df2_str = pd.DataFrame(df2['num'].str)
lst = [all(df2_str[col].isin(df1_str[col])) for col in df2_str.columns]
print(lst)
CodePudding user response:
You can use set operations here:
pd.Series([set(a)>=set(b) for a,b in zip(df1['num'], df2['num'])], index=df1.index)
output:
0 True
1 False
2 True
dtype: bool
Or to assign to one of the dataframes:
df1['test'] = [set(a)>=set(b) for a,b in zip(df1['num'], df2['num'])]
output:
num test
0 [10, 2] True
1 [120] False
2 [2, 5, 8] True
CodePudding user response:
Use issubset
method with convert values from df1['num']
to set
s:
df1['new'] = [set(b).issubset(a) for a,b in zip(df1['num'], df2['num'])]
print (df1)
num new
0 [10, 2] True
1 [120] False
2 [2, 5, 8] True
If values are not splitted modify solution by:
df1['test'] = [set(b.split()).issubset(a.split()) for a,b in zip(df1['num'], df2['num'])]