I have thousands of names in column A I must standardize the spelling due to simple mistakes with white spaces and periods.
I have names like
Ajay K Thapar
Ajay K. Thapar
or
Alan T. K. Wan
Alan T K Wan
Alan T. K Wan
Alan T.K. Wan
Alan T.K Wan
Alan T.K.Wan
I am having a hard time getting a regex to
- Remove space between single letters with a period or without
Alan T. K. Wan = Alan T.K. Wan
A. C. T. van Duin = A.C.T. van Duin
A. C.T. van Duin = A.C.T. van Duin
- Place a period if it is Missing on single letters.
Alan T K. Wan = Alan T.K. Wan
A C T van Duin = A.C.T. van Duin
- Add space if missing after the last period.
Alan T.K.Wan = Alan T.K. Wan
A.C.T.van Duin = A.C.T. van Duin
Name | Fixed |
---|---|
Ajay K Thapar | Ajay K. Thapar |
Ajay K. Thapar | Ajay K. Thapar |
A.T Chamberlain | A.T. Chamberlain |
A. C. T. van Duin | A.C.T. van Duin |
Alice P S Kong | Alice P.S. Kong |
Alan T. K. Wan | Alan T.K. Wan |
Alan T K Wan | Alan T.K. Wan |
Alan T. K Wan | Alan T.K. Wan |
Alan T K.Wan | Alan T.K. Wan |
Alan T.K. Wan | Alan T.K. Wan |
CodePudding user response:
use:
=INDEX(BYROW(SPLIT(A1:A10, ". "), LAMBDA(x, TRIM(TEXTJOIN(, 1,
IF(REGEXMATCH(x, "^[A-Z]$"), x&".", " "&x&" "))))))
CodePudding user response:
You need to do the replacement in 2 iterations. One of possible solutions:
Normalize dots and spaces after single letters, so that every single letter is followed by a dot and a space.
pattern:
\b([A-Z])([\s.] |$)
replacement:
$1.
Strip spaces between multiple single letters.
pattern:
\b([A-Z]\.) (?=[A-Z]\.)
replacement:
$1
CodePudding user response:
In the event you discover more transformations you need to do, it might be easiest to simply use nested regexreplace()
functions. This formula will match the desired results you show:
=arrayformula(
regexreplace( regexreplace( regexreplace(
trim(A2:A11),
"\b(\w) ", "$1. " ),
"\. ", "." ),
"\.(\w\w)", ". $1" )
)
To learn the exact regular expression syntax used by Google Sheets, see RE2.