I am trying to extract a person's name between different characters. For example, the cells contains this information
PATIENT: 2029985 - COLLINS, JUNIOR .
PATIENT: 1235231-02 - JERRY JR, PATRICK .
PATIENT: 986435--EXP-- - JULIUS, DANIEL .
PATIENT: 2021118-02 - DRED-HARRY, KEVIN .
My goal is to use one REGEXTRACT formula to get the following:
COLLINS, JUNIOR
JERRY JR, PATRICK
JULIUS, DANIEL
LOVE ALSTON, BRENDA
So far, I have come up with the formula:
=ARRAYFORMULA(REGEXEXTRACT(B3:B, "-(.*)\."))
Where B3 contains the first information
Using that formula, I get:
COLLINS, JUNIOR
02 - JERRY JR, PATRICK
02 - LOVE-ALSTON, BRENDA
-EXP-- - JULIUS, DANIEL
02 - DRED-HARRY, KEVIN
I managed to get the first name down but how do I go about extracting the rest.
CodePudding user response:
1st solution: With your shown samples, please try following regex.
^PATIENT:.*-\s ([^.]*?)\s*\.
OR try following Google-sheet forumla:
=ARRAYFORMULA(REGEXEXTRACT(B3:B, "^PATIENT:.*-\s ([^.]*?)\s*\."))
Explanation: Checking if line/value starts from PATIENT followed by :
till -
(using greedy mechanism here), which is followed by spaces(1 or more occurrences). Then creating one and only capturing group which contains everything just before .
(dot) in it making it non-greedy, closing capturing group which is followed by spaces(0 or more occurrences) followed by a literal dot.
2nd solution: Using lazy match approach in regex, please try following regex.
.*?\s-\s([^.]*?)\s*\.
Google-sheet formula will be as follows:
=ARRAYFORMULA(REGEXEXTRACT(B3:B, ".*?\s-\s([^.]*?)\s*\."))
CodePudding user response:
You can use
=ARRAYFORMULA(REGEXEXTRACT(B3:B, "\s-\s ([^.]*?)\s*\."))
See the regex demo. Details:
\s-\s
- a whitespace,-
, one or more whitespaces([^.]*?)
- Group 1: zero or more chars other than a.
as few as possible\s*
- zero or more whitespaces\.
- a.
char.