Home > database >  Regex to extract each alphanumeric pattern
Regex to extract each alphanumeric pattern

Time:11-18

I have a string like following

19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit

How can I write a regex that would give me these two separate strings

19990101 - John DoeLorem ipsum dolor sit amet

19990102 - Elton Johnconsectetur adipiscing elit

The regex I wrote works up to this

/\d -/gm

Image

But I don't know how can I include the alphabets there as well

Image2

CodePudding user response:

For the OP's use case a regex based split like with ... str.split(/(?<=\w)\s (?=\d)/) ... already should do it.

The regex uses lookarounds, here trying to match any whitespace (sequence)/\s which is both led/(?<= ... ) by a word/\w and is followed/(?= ... ) by a digit/\d character.

console.log(
  '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit 19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit'
    .split(/(?<=\w)\s (?=\d)/)
);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

CodePudding user response:

You can use

const text = '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit';
console.log(text.match(/\d \s -[A-Za-z0-9\s]*[A-Za-z]/g))
console.log(text.split(/(?!^)\s (?=\d \s -)/))
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>

The text.match(/\d \s -[A-Za-z0-9\s]*[A-Za-z]/g) approach is extracting the alphanumeric/whitespace chars after \d \s - pattern. Details:

  • \d - one or more digits
  • \s - one or more whitespaces
  • - - a hyphen
  • [A-Za-z0-9\s]* - zero or more alphanumeric or whitespace chars
  • [A-Za-z] - a letter

The text.split(/(?!^)\s (?=\d \s -)/) splitting approach breaks the string with one or more whitespaces before one or more digits one or more whitespaces -:

  • (?!^) - not at the start of string
  • \s - one or more whitespaces
  • (?=\d \s -) - a positive lookahead that matches a location that is immediately followed with one or more digits one or more whitespaces -.
  • Related