I have a string like following
19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit
How can I write a regex that would give me these two separate strings
19990101 - John DoeLorem ipsum dolor sit amet
19990102 - Elton Johnconsectetur adipiscing elit
The regex I wrote works up to this
/\d -/gm
But I don't know how can I include the alphabets there as well
CodePudding user response:
For the OP's use case a regex based split
like with ... str.split(/(?<=\w)\s (?=\d)/)
... already should do it.
The regex uses lookarounds, here trying to match any whitespace (sequence)/\s
which is both led/(?<= ... )
by a word/\w
and is followed/(?= ... )
by a digit/\d
character.
console.log(
'19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit 19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit'
.split(/(?<=\w)\s (?=\d)/)
);
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
You can use
const text = '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit';
console.log(text.match(/\d \s -[A-Za-z0-9\s]*[A-Za-z]/g))
console.log(text.split(/(?!^)\s (?=\d \s -)/))
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
The text.match(/\d \s -[A-Za-z0-9\s]*[A-Za-z]/g)
approach is extracting the alphanumeric/whitespace chars after \d \s -
pattern. Details:
\d
- one or more digits\s
- one or more whitespaces-
- a hyphen[A-Za-z0-9\s]*
- zero or more alphanumeric or whitespace chars[A-Za-z]
- a letter
The text.split(/(?!^)\s (?=\d \s -)/)
splitting approach breaks the string with one or more whitespaces before one or more digits one or more whitespaces -
:
(?!^)
- not at the start of string\s
- one or more whitespaces(?=\d \s -)
- a positive lookahead that matches a location that is immediately followed with one or more digits one or more whitespaces-
.