Home > database >  Highlight partial/complete matches from 2 columns in google sheets
Highlight partial/complete matches from 2 columns in google sheets

Time:02-03

I have a list of names starting in the first column and a second list starting in the second column. Some of the names are the exact same while others also have a shorter/nickname, include middle names/initials, or a combination of both in the columns. For example:

List One List Two
John Smith Anthony Moon
Anthony James Moon Edward Flores
Edward Flores (Eddie) John Smith
Angelica C Mary Roach Roach (Angie) Angelica Roach

I'm trying to use a conditional formatting to highlight cells in the second column to names that are in the first column. However, it only sometimes works because either list could have more of their name. Cells that just include a middle name tend to highlight correctly, but if there are names within parentheses or include an initial, highlighting doesn't seem to work.

Any help would be appreciated.

edit: Example Sheet with manually highlighted cells.

edit 2: I realized that column 2 will always just be first and last name. I'm not sure if this will make things easier or not. Just wanted to include it and I've also updated the example sheet to reflect this.

CodePudding user response:

I duplicated your tab and applied the following conditional formatting custom formula to the range B2:B

=COUNTIF(INDEX(LAMBDA(dist,dist/IF(LEN(A$2:A)>LEN(B2),LEN(A$2:A),LEN(B2)))(MAP(A$2:A,LAMBDA(ζ,INDEX(REDUCE({0,SEQUENCE(1,LEN(ζ))},SEQUENCE(LEN(B2)),LAMBDA(a,c,{INDEX(a,1) 1,SCAN(INDEX(a,1) 1,SEQUENCE (1,LEN(ζ)),LAMBDA(x,y,MIN(INDEX(a,y 1) 1,x 1,INDEX(a,y) 1*(NOT(EXACT(MID(B2,c,1),MID(ζ,y,1)))))))})),LEN(ζ) 1))))),"<="&C$1)

Where C1 is a number that specifies how similar the strings should be in order to be highlighted. If it's 0 it will only highlight the exact matches, if it's 1 it will highlight everything. 0.52 seems to be the sweet spot for the data you provided.

This uses the Levenshtein distance formula developed by Astral.

If the sheet gets too slow you may want to remove those formulas from the conditional formatting, enter them in a helper column and refer to that column instead.

  • Related