Home > database >  Look for 100% matches,partial matches and dismatches between python list and list column in Pandas
Look for 100% matches,partial matches and dismatches between python list and list column in Pandas

Time:10-21

I know how to compare lists but working with unflatten data in Pandas is beyond my skills..anyway.. I have:

list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

and

df = pd.DataFrame({
"col1":["a", "b", "c", "d"], 
"col2":[[1, 2, 3, 4, 5], [20, 30, 100], [1, 3, 20], [20,30]]
})

I want to compare the list with the df and print rows that contain 100% matches, partial matches and dismatches. Desired outcome is then this:

"100% matches are in rows: a"
"Partial matches are in rows: c"
"Dismatches are in rows: b, d"

Python 3 solution would be much appreciated.Thanks!

CodePudding user response:

You can use something like:

lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

s = df['col2'].explode().isin(lst).groupby(level=0).mean()

import numpy as np
group = np.select([s.eq(1), s.eq(0)],
                  ['100% matches', 'Dismatches'],
                  'Partial matches')

df['col1'].groupby(group).agg(', '.join)

output:

100% matches          a
Dismatches         b, d
Partial matches       c
Name: col1, dtype: object

CodePudding user response:

Code:

###INPUT
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
df = pd.DataFrame({
"col1": ['a', 'b', 'c', 'd'], 
"col2":[[1, 2, 3, 4, 5], [20, 30, 100], [1, 3, 20], [20,30]]})
 

#Creating another list to define categorize
cat = ['100% matches are in rows:', 'Partial matches are in rows:', 'Dismatches are in rows:']


#Using lambda fun reading DF row by row and compareing row list with actaul list
#If difference is 0 pick first category from cat list
#if difference is 1 pick second category from cat list
# if difference more than one pick the last category from cat list

df['col1'].groupby(df['col2'].apply(lambda row: cat[-1] if len(set(row).difference(lst)) > 1 else cat[len(set(row).difference(lst))])).agg(','.join)

Output:

col2
100% matches are in rows         a
Dismatches are in rows         b,d
Partial matches are in rows      c
Name: col1, dtype: object
  • Related