Home > database >  How to use REGEXTRACT to extract certain characters between two strings
How to use REGEXTRACT to extract certain characters between two strings

Time:11-12

I am trying to extract a person's name between different characters. For example, the cells contains this information

PATIENT: 2029985 - COLLINS, JUNIOR .
PATIENT: 1235231-02 - JERRY JR, PATRICK .
PATIENT: 986435--EXP-- - JULIUS, DANIEL .
PATIENT: 2021118-02 - DRED-HARRY, KEVIN .

My goal is to use one REGEXTRACT formula to get the following:

COLLINS, JUNIOR
JERRY JR, PATRICK
JULIUS, DANIEL
LOVE ALSTON, BRENDA

So far, I have come up with the formula:

=ARRAYFORMULA(REGEXEXTRACT(B3:B, "-(.*)\."))

Where B3 contains the first information

Using that formula, I get:

COLLINS, JUNIOR
02 - JERRY JR, PATRICK
02 - LOVE-ALSTON, BRENDA 
-EXP-- - JULIUS, DANIEL
02 - DRED-HARRY, KEVIN

I managed to get the first name down but how do I go about extracting the rest.

CodePudding user response:

1st solution: With your shown samples, please try following regex.

Online demo for above regex

^PATIENT:.*-\s ([^.]*?)\s*\.

OR try following Google-sheet forumla:

=ARRAYFORMULA(REGEXEXTRACT(B3:B, "^PATIENT:.*-\s ([^.]*?)\s*\."))

Explanation: Checking if line/value starts from PATIENT followed by : till -(using greedy mechanism here), which is followed by spaces(1 or more occurrences). Then creating one and only capturing group which contains everything just before .(dot) in it making it non-greedy, closing capturing group which is followed by spaces(0 or more occurrences) followed by a literal dot.



2nd solution: Using lazy match approach in regex, please try following regex.

.*?\s-\s([^.]*?)\s*\.

Google-sheet formula will be as follows:

=ARRAYFORMULA(REGEXEXTRACT(B3:B, ".*?\s-\s([^.]*?)\s*\."))

Online demo for above regex

CodePudding user response:

You can use

=ARRAYFORMULA(REGEXEXTRACT(B3:B, "\s-\s ([^.]*?)\s*\."))

See the regex demo. Details:

  • \s-\s - a whitespace, -, one or more whitespaces
  • ([^.]*?) - Group 1: zero or more chars other than a . as few as possible
  • \s* - zero or more whitespaces
  • \. - a . char.
  • Related