Home > Net >  Get INSERT INTO statement from SQL statements string in Python
Get INSERT INTO statement from SQL statements string in Python

Time:11-17

I have a string like the following:

sql = """DROP TABLE IF EXISTS table1;

ALTER TABLE table1 DROP PRIMARY KEY;

INSERT INTO table1 (id, created, name, telefonnummer, erPatient_id) VALUES
    (1, '2015-08-06 12;09:08', ' ', ' ', 16528),
    (2, '2015-08-06 12:43:11', ' ', ' ', 16529)
;

INSERT INTO table2 (comment, id) VALUES
('hello this is a semicolon ;', 2);"""

And I would like to get the statement INSERT INTO table1:

INSERT INTO table1 (id, created, name, telefonnummer, erPatient_id) VALUES
        (1, '2015-08-06 12;09:08', ' ', ' ', 16528),
        (2, '2015-08-06 12:43:11', ' ', ' ', 16529)
    ;

I can't split the string with sql.split(';) because there are semicolons within the VALUES to insert.

I have tried with regex with no success:

import re
pattern_string = r"INSERT INTO bezugsperson[(]*[^)] \)[^)]"
q = re.findall(pattern_string, data, re.MULTILINE | re.DOTALL)

In the real string there will be thousands of values to insert and dozens of tables.

CodePudding user response:

Use

import re
pattern_string = r"\bINSERT INTO \w \s.*?;\s*$"
q = re.findall(pattern_string, data, re.MULTILINE | re.DOTALL)

See regex proof.

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  INSERT INTO              'INSERT INTO '
--------------------------------------------------------------------------------
  \w                       word characters 1 or more times 
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  .*?                      any character (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  ;                        ';'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        end of a line

CodePudding user response:

Regular expression to capture everything between INSERT INTO table1 and \n;:

pattern_string = "INSERT INTO table1 .*\n;"
q = re.findall(pattern_string, sql, re.DOTALL)
  • Related