I know how to compare lists but working with unflatten data in Pandas is beyond my skills..anyway.. I have:
list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and
df = pd.DataFrame({
"col1":["a", "b", "c", "d"],
"col2":[[1, 2, 3, 4, 5], [20, 30, 100], [1, 3, 20], [20,30]]
})
I want to compare the list
with the df
and print
rows that contain 100% matches, partial matches and dismatches. Desired outcome is then this:
"100% matches are in rows: a"
"Partial matches are in rows: c"
"Dismatches are in rows: b, d"
Python 3 solution would be much appreciated.Thanks!
CodePudding user response:
You can use something like:
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
s = df['col2'].explode().isin(lst).groupby(level=0).mean()
import numpy as np
group = np.select([s.eq(1), s.eq(0)],
['100% matches', 'Dismatches'],
'Partial matches')
df['col1'].groupby(group).agg(', '.join)
output:
100% matches a
Dismatches b, d
Partial matches c
Name: col1, dtype: object
CodePudding user response:
Code:
###INPUT
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
df = pd.DataFrame({
"col1": ['a', 'b', 'c', 'd'],
"col2":[[1, 2, 3, 4, 5], [20, 30, 100], [1, 3, 20], [20,30]]})
#Creating another list to define categorize
cat = ['100% matches are in rows:', 'Partial matches are in rows:', 'Dismatches are in rows:']
#Using lambda fun reading DF row by row and compareing row list with actaul list
#If difference is 0 pick first category from cat list
#if difference is 1 pick second category from cat list
# if difference more than one pick the last category from cat list
df['col1'].groupby(df['col2'].apply(lambda row: cat[-1] if len(set(row).difference(lst)) > 1 else cat[len(set(row).difference(lst))])).agg(','.join)
Output:
col2
100% matches are in rows a
Dismatches are in rows b,d
Partial matches are in rows c
Name: col1, dtype: object