My goal is to split the string into groups. The problem is the current regex fails to recognize the part correctly.
The regex:
^(?: {2,})?(?P<TANGGAL>[0-9]{2}/[0-9]{2}){0,1}(?: {2,})?(?P<KETERANGAN1>[\w-/:] (?: [\w-/:] )*){0,1}(?: {2,})?(?P<KETERANGAN2>[\w-/:] (?: [\w-/:] )*){0,1}(?: {2})?(?P<SALDO>[\d,.] ){0,1}
The string:
01/07 SALDO AWAL 1,000.00
The problem: The regex captures:
1
from the string1,000.00
as GroupKETERANGAN2
instead of GroupSALDO
.,000.00
as GroupSALDO
instead of capturing the whole1,000.00
.
CodePudding user response:
You can change optional capturing groups into obligatory and move them into the optional non-capturing groups that match the column delimiters:
^(?: {2,}(?P<TANGGAL>[0-9]{2}/[0-9]{2}))?(?: {2,}(?P<KETERANGAN1>[\w/:-] (?: [\w/:-] )*))?(?: {2,}(?P<KETERANGAN2>[\w/:-] (?: [\w/:-] )*))?(?: {2,}(?P<SALDO>[\d,.] ))?$
See the regex demo.
Note the added $
end of string anchor, it is necessary to make sure the whole line is matched.
Details:
^
- start of string(?: {2,}(?P<TANGGAL>[0-9]{2}/[0-9]{2}))?
- an optional non-capturing group matching two or more spaces and then capturing into Group "TANGGAL" two digits,/
, two digits(?: {2,}(?P<KETERANGAN1>[\w/:-] (?: [\w/:-] )*))?
- an optional non-capturing group matching two or more spaces and then capturing into Group "KETERANGAN1" one or more word,/
,:
or-
chars and then zero or more sequences of a space and then one or more word,/
,:
,-
chars(?: {2,}(?P<KETERANGAN2>[\w/:-] (?: [\w/:-] )*))?
- an optional non-capturing group matching two or more spaces and then capturing into Group "KETERANGAN2" one or more word,/
,:
or-
chars and then zero or more sequences of a space and then one or more word,/
,:
,-
chars(?: {2,}(?P<SALDO>[\d,.] ))?
- an optional non-capturing group matching two or more spaces and then capturing into Group "SALDO" one or more digits,,
or.
chars$
- end of string.