Home > Blockchain >  Regex to format comma-separated list from 1-per-line to n-per-line
Regex to format comma-separated list from 1-per-line to n-per-line

Time:03-15

I have a SQL query in a string as follows:

SELECT COL1,
       COL2,
       COL3,
       COL4,
       COL5,
       COL6,
       COL7,
       COL8,
       COL9,
       COL10,
       COL11
  FROM MY_TABLE

And I want to use regex to reformat it to

SELECT COL1, COL2, COL3, COL4,
       COL5, COL6, COL7, COL8,
       COL9, COL10, COL11
  FROM MY_TABLE

(the second line ideally has the indentation intact from before)

so that we reduce the number of lines the query can take especially when we have hundreds of columns. Is this possible? I would be implementing this regex in python if that helps! (A pythonic answer is also acceptable, though we should assume there's multiple Nested queries as well)

CodePudding user response:

This took a painfully long time to come up with, but we can try using re.sub here with a callback funtion:

inp = """SELECT COL1,
       COL2,
       COL3,
       COL4,
       COL5,
       COL6,
       COL7,
       COL8,
       COL9,
       COL10,
       COL11
  FROM MY_TABLE"""
output = re.sub(r'\w (?:,?\n\s*(?!FROM\b)\w ){0,3}', lambda m: ' '.join(m.group().split()), inp)
print(output)

This prints:

SELECT COL1, COL2, COL3, COL4,
       COL5, COL6, COL7, COL8,
       COL9, COL10, COL11
  FROM MY_TABLE

The regex pattern used here says to match:

\w                 a term in the select clause
(?:
    ,?             followed by optional comma
    \n\s*          newline and optional whitespace
    (?!FROM\b)\w   another term which is NOT 'FROM' (i.e. stop at FROM)
){0,3}             zero to three more select terms (so group of at most 4)
  • Related