Home > Software engineering >  Splitting a string by comma that is preceded by a specific word
Splitting a string by comma that is preceded by a specific word

Time:02-03

I want to split a query by comma only if it is preceded by AS.

A small part of My Query:

DATE_TRUNC('month', timestamp) AS month_begin_dt
        ,   FIRST_VALUE(monitorsessionid) OVER(PARTITION BY openpsid,DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS monitorsessionid
        ,   FIRST_VALUE(vrr) OVER(PARTITION BY openpsid,DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS vrr

I tried to split the query in java using the split() method with the following regular expression.

String[] queryArray = internalQuery.split(",(?![^\\(]*\\))");

I want my split array to have elements as:

[0] = DATE_TRUNC('month', timestamp) AS month_begin_dt

[1] = FIRST_VALUE(monitorsessionid) OVER(PARTITION BY openpsid,DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS monitorsessionid

[2] = FIRST_VALUE(vrr) OVER(PARTITION BY openpsid,DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS vrr

The output I am getting in the queryArray is:

[0] = DATE_TRUNC('month', timestamp) AS month_begin_dt

[1] = FIRST_VALUE(monitorsessionid) OVER(PARTITION BY openpsid

[2] = DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS monitorsessionid

[3] = FIRST_VALUE(vrr) OVER(PARTITION BY openpsid

[4] = DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS vrr

The Desired Output in the queryArray

[0] = DATE_TRUNC('month', timestamp) AS month_begin_dt

[1] = FIRST_VALUE(monitorsessionid) OVER(PARTITION BY
openpsid,DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS monitorsessionid

[2] = FIRST_VALUE(vrr) OVER(PARTITION BY openpsid,DATE_TRUNC('month', timestamp) ORDER BY timestamp DESC) AS vrr

CodePudding user response:

In Java you can use this regex for splitting:

(?<=\s{1,99}[aA][sS]\s{1,99}\w{1,99})\s*,\s*

We are using finite range quantifiers {1,99} instead of just here because Java regex engine doesn't allow dynamic length look behind assertions.

RegEx Demo

Java Code:

String[] queryArray = internalQuery.split(
   "(?<=\\s{1,99}[aA][aS]\\s{1,99}\\w{1,99})\\s*,\\s*");

RegEx Breakdown:

  • (?<=\s{1,99}[aA][sS]\s{1,99}\w{1,99}): Lookbehind assertion that makes sure that we have word AS (ignore case) surrounded with 1 to 99 whitespaces on either side followed by a 1 (upto 99) word characters as alias name
  • \s*,\s*: Match a comma surrounded with 0 or more whitespaces on either side
  • Related