Home > Enterprise >  Is it possible for regex to recognize whether a column is a string or an int?
Is it possible for regex to recognize whether a column is a string or an int?

Time:08-27

My goal is to split the string into groups. The problem is the current regex fails to recognize the part correctly.

The regex:

^(?: {2,})?(?P<TANGGAL>[0-9]{2}/[0-9]{2}){0,1}(?: {2,})?(?P<KETERANGAN1>[\w-/:] (?: [\w-/:] )*){0,1}(?: {2,})?(?P<KETERANGAN2>[\w-/:] (?: [\w-/:] )*){0,1}(?: {2})?(?P<SALDO>[\d,.] ){0,1}

The string:

      01/07          SALDO AWAL                                                                                                                       1,000.00

The problem: The regex captures:

  1. 1 from the string 1,000.00 as Group KETERANGAN2 instead of Group SALDO.
  2. ,000.00 as Group SALDO instead of capturing the whole 1,000.00.

CodePudding user response:

You can change optional capturing groups into obligatory and move them into the optional non-capturing groups that match the column delimiters:

^(?: {2,}(?P<TANGGAL>[0-9]{2}/[0-9]{2}))?(?: {2,}(?P<KETERANGAN1>[\w/:-] (?: [\w/:-] )*))?(?: {2,}(?P<KETERANGAN2>[\w/:-] (?: [\w/:-] )*))?(?: {2,}(?P<SALDO>[\d,.] ))?$

See the regex demo.

Note the added $ end of string anchor, it is necessary to make sure the whole line is matched.

Details:

  • ^ - start of string
  • (?: {2,}(?P<TANGGAL>[0-9]{2}/[0-9]{2}))? - an optional non-capturing group matching two or more spaces and then capturing into Group "TANGGAL" two digits, /, two digits
  • (?: {2,}(?P<KETERANGAN1>[\w/:-] (?: [\w/:-] )*))? - an optional non-capturing group matching two or more spaces and then capturing into Group "KETERANGAN1" one or more word, /, : or - chars and then zero or more sequences of a space and then one or more word, /, :, - chars
  • (?: {2,}(?P<KETERANGAN2>[\w/:-] (?: [\w/:-] )*))? - an optional non-capturing group matching two or more spaces and then capturing into Group "KETERANGAN2" one or more word, /, : or - chars and then zero or more sequences of a space and then one or more word, /, :, - chars
  • (?: {2,}(?P<SALDO>[\d,.] ))? - an optional non-capturing group matching two or more spaces and then capturing into Group "SALDO" one or more digits, , or . chars
  • $ - end of string.
  • Related