Home > Software design >  How to match between row values
How to match between row values

Time:10-20

serial name match
1 John 5,6,8
2 Steve 1,7
3 Kevin 4
4 Kevin 3
5 John 1,6,8
6 Johnn 1,5,8
7 Steves 2
8 John 1,5,6

Need to check and match the name of each row with the name of the row serial number mentioned in it's match column. Keep the serials matching else remove it. If nothing is matching then put null.

serial name match updated_match
1 John 5,6,8 5,8
2 Steve 1,7
3 Kevin 4 4
4 Kevin 3 3
5 John 1,6,8 1,8
6 Johnn 1,5,8
7 Steves 2
8 John 1,5,6 1,5

CodePudding user response:

Convert values of serial column to strings,then mapping aggregate sets to Series wth same size like original, split column match and get difference with intersection of sets, last sorting and join back to strings:

s = df['serial'].astype(str)
sets = df['name'].map(s.groupby(df['name']).agg(set))
match = df['match'].str.split(',')

df['updated_match'] = [','.join(sorted(b.difference([c]).intersection(a))) 
                        for a, b, c in zip(match, sets, s)]
print (df)
   serial    name  match updated_match
0       1    John  5,6,8           5,8
1       2   Steve    1,7              
2       3   Kevin      4             4
3       4   Kevin      3             3
4       5    John  1,6,8           1,8
5       6   Johnn  1,5,8              
6       7  Steves      2              
7       8    John  1,5,6           1,5

CodePudding user response:

You can use mappings to determine which match have the same names:

# ensure we have strings in serial
df = df.astype({'serial': 'str'})

# split and explode he individual serials
s = df['match'].str.split(',').explode()
# make a mapper: serial -> name
mapper = df.set_index('serial')['name']
# get exploded names
s2 = df.loc[s.index, 'serial'].map(mapper)

# keep only match with matching names
# aggregate back as string
df['updated_match'] = (s[s.map(mapper).eq(s2)]
                       .groupby(level=0).agg(','.join)
                       .reindex(df.index, fill_value='')
                      )

output:

  serial    name  match updated_match
0      1    John  5,6,8           5,8
1      2   Steve    1,7              
2      3   Kevin      4             4
3      4   Kevin      3             3
4      5    John  1,6,8           1,8
5      6   Johnn  1,5,8              
6      7  Steves      2              
7      8    John  1,5,6           1,5

Alternative with a groupby.apply:

df['updated_match'] = (df.groupby('name', group_keys=False)
                         .apply(lambda g: g['match'].str.split(',').explode()
                                          .loc[lambda x: x.isin(g['serial'])]
                                          .groupby(level=0).agg(','.join))
                         .reindex(df.index, fill_value='')
                      )
  • Related