Home > Mobile >  How do I standardize names like "Alice P S. T.Kong" to a format like "Alice P.S.T. Ko
How do I standardize names like "Alice P S. T.Kong" to a format like "Alice P.S.T. Ko

Time:01-15

I have thousands of names in column A whose spacing and punctuation varies. How do I trim repeated white space, get exactly one period and no white space between initials, and have exactly one space after initials and before the last name?

Names Desired results
Ajay K Thapar Ajay K. Thapar
Ajay K. Thapar Ajay K. Thapar
A.T Chamberlain A.T. Chamberlain
A. C. T. van Duin A.C.T. van Duin
Alice P S Kong Alice P.S. Kong
Alan T. K. Wan Alan T.K. Wan
Alan T K Wan Alan T.K. Wan
Alan T. K Wan Alan T.K. Wan
Alan T K.Wan Alan T.K. Wan
Alan T.K. Wan Alan T.K. Wan

CodePudding user response:

use:

=INDEX(BYROW(SPLIT(A1:A10, ". "), LAMBDA(x, TRIM(TEXTJOIN(, 1, 
 IF(REGEXMATCH(x, "^[A-Z]$"), x&".", " "&x&" "))))))

enter image description here

CodePudding user response:

You need to do the replacement in 2 iterations. One of possible solutions:

  1. Normalize dots and spaces after single letters, so that every single letter is followed by a dot and a space.

    pattern: \b([A-Z])([\s.] |$)

    replacement: $1.

  2. Strip spaces between multiple single letters.

    pattern: \b([A-Z]\.) (?=[A-Z]\.)

    replacement: $1

CodePudding user response:

In the event you need more complex transformations, it might be easiest to simply use nested regexreplace() functions. This formula will match the desired results you show:

=arrayformula( 
  regexreplace( regexreplace( regexreplace( 
    trim(A2:A11), 
    "\b(\w) ", "$1. " ), 
    "\. ", "." ), 
    "\.(\w\w)", ". $1" ) 
)

If you do not know the number of regular expressions to apply in advance, use this pattern:

=lambda(
  data, regexes, replaceWith, 
  byrow( 
    data, 
    lambda( 
      row, 
      if( 
        len(row), 
        reduce( 
          row, sequence(counta(regexes)), 
          lambda( 
            acc, regexIndex, 
            regexreplace( 
              acc, 
              index(regexes, regexIndex), 
              index(replaceWith, regexIndex) 
            ) 
          ) 
        ), 
        iferror(1/0)  
      ) 
    ) 
  ) 
)( 
  arrayformula(trim(A2:A)), 
  { "\b(\w) ", "\. ", "\.(\w\w)" }, 
  { "$1. ", ".", ". $1" }  
)

You can replace the two { array expressions } at the end with references to filter() formulas that fetch regular expressions from a range of cells.

To learn the exact regular expression syntax used by Google Sheets, see RE2.

  • Related