Home > database >  RegEx to standardize names Google sheets
RegEx to standardize names Google sheets

Time:01-14

I have thousands of names in column A I must standardize the spelling due to simple mistakes with white spaces and periods.

I have names like

Ajay K Thapar
Ajay K. Thapar

or

Alan T. K. Wan
Alan T K Wan
Alan T. K Wan
Alan T.K. Wan
Alan T.K Wan
Alan T.K.Wan

I am having a hard time getting a regex to

  1. Remove space between single letters with a period or without
Alan T. K. Wan = Alan T.K. Wan
A. C. T. van Duin = A.C.T. van Duin
A. C.T. van Duin = A.C.T. van Duin
  1. Place a period if it is Missing on single letters.
Alan T K. Wan = Alan T.K. Wan
A C T van Duin = A.C.T. van Duin
  1. Add space if missing after the last period.
Alan T.K.Wan = Alan T.K. Wan
A.C.T.van Duin = A.C.T. van Duin
Name Fixed
Ajay K Thapar Ajay K. Thapar
Ajay K. Thapar Ajay K. Thapar
A.T Chamberlain A.T. Chamberlain
A. C. T. van Duin A.C.T. van Duin
Alice P S Kong Alice P.S. Kong
Alan T. K. Wan Alan T.K. Wan
Alan T K Wan Alan T.K. Wan
Alan T. K Wan Alan T.K. Wan
Alan T K.Wan Alan T.K. Wan
Alan T.K. Wan Alan T.K. Wan

CodePudding user response:

use:

=INDEX(BYROW(SPLIT(A1:A10, ". "), LAMBDA(x, TRIM(TEXTJOIN(, 1, 
 IF(REGEXMATCH(x, "^[A-Z]$"), x&".", " "&x&" "))))))

enter image description here

CodePudding user response:

You need to do the replacement in 2 iterations. One of possible solutions:

  1. Normalize dots and spaces after single letters, so that every single letter is followed by a dot and a space.

    pattern: \b([A-Z])([\s.] |$)

    replacement: $1.

  2. Strip spaces between multiple single letters.

    pattern: \b([A-Z]\.) (?=[A-Z]\.)

    replacement: $1

CodePudding user response:

In the event you discover more transformations you need to do, it might be easiest to simply use nested regexreplace() functions. This formula will match the desired results you show:

=arrayformula( 
  regexreplace( regexreplace( regexreplace( 
    trim(A2:A11), 
     "\b(\w) ",  "$1. " ), 
         "\. ",  "."    ), 
    "\.(\w\w)",  ". $1" ) 
)

To learn the exact regular expression syntax used by Google Sheets, see RE2.

  • Related