I have thousands of names in column A
whose spacing and punctuation varies. How do I trim repeated white space, get exactly one period and no white space between initials, and have exactly one space after initials and before the last name?
Names | Desired results |
---|---|
Ajay K Thapar | Ajay K. Thapar |
Ajay K. Thapar | Ajay K. Thapar |
A.T Chamberlain | A.T. Chamberlain |
A. C. T. van Duin | A.C.T. van Duin |
Alice P S Kong | Alice P.S. Kong |
Alan T. K. Wan | Alan T.K. Wan |
Alan T K Wan | Alan T.K. Wan |
Alan T. K Wan | Alan T.K. Wan |
Alan T K.Wan | Alan T.K. Wan |
Alan T.K. Wan | Alan T.K. Wan |
CodePudding user response:
use:
=INDEX(BYROW(SPLIT(A1:A10, ". "), LAMBDA(x, TRIM(TEXTJOIN(, 1,
IF(REGEXMATCH(x, "^[A-Z]$"), x&".", " "&x&" "))))))
CodePudding user response:
You need to do the replacement in 2 iterations. One of possible solutions:
Normalize dots and spaces after single letters, so that every single letter is followed by a dot and a space.
pattern:
\b([A-Z])([\s.] |$)
replacement:
$1.
Strip spaces between multiple single letters.
pattern:
\b([A-Z]\.) (?=[A-Z]\.)
replacement:
$1
CodePudding user response:
In the event you need more complex transformations, it might be easiest to simply use nested regexreplace()
functions. This formula will match the desired results you show:
=arrayformula(
regexreplace( regexreplace( regexreplace(
trim(A2:A11),
"\b(\w) ", "$1. " ),
"\. ", "." ),
"\.(\w\w)", ". $1" )
)
If you do not know the number of regular expressions to apply in advance, use this pattern:
=lambda(
data, regexes, replaceWith,
byrow(
data,
lambda(
row,
if(
len(row),
reduce(
row, sequence(counta(regexes)),
lambda(
acc, regexIndex,
regexreplace(
acc,
index(regexes, regexIndex),
index(replaceWith, regexIndex)
)
)
),
iferror(1/0)
)
)
)
)(
arrayformula(trim(A2:A)),
{ "\b(\w) ", "\. ", "\.(\w\w)" },
{ "$1. ", ".", ". $1" }
)
You can replace the two { array expressions } at the end with references to filter()
formulas that fetch regular expressions from a range of cells.
To learn the exact regular expression syntax used by Google Sheets, see RE2.